Content uploaded by Homa Alemzadeh
Author content
All content in this area was uploaded by Homa Alemzadeh on Sep 01, 2022
Content may be subject to copyright.
130
A Review of Cognitive Assistants for Healthcare: Trends,
Prospects, and Future Directions
SARAH MASUD PREUM, Department of Computer Science, University of Virginia
SIRAJUM MUNIR, Bosch Research and Technology Center
MEIYI MA, Department of Computer Science, University of Virginia
MOHAMMAD SAMIN YASAR, Department of Electrical and Computer Engineering, University of
Virginia
DAVID J. STONE, Departments of Anesthesiology and Neurosurgery, and the Center for Advanced
Medical Analytics, University of Virginia School of Medicine; MIT Critical Data, Laboratory
for Computational Physiology, Harvard-MIT Health Sciences and Technology, Massachusetts Institute
of Technology
RONALD WILLIAMS and HOMA ALEMZADEH, Department of Electrical and Computer
Engineering, University of Virginia
JOHN A. STANKOVIC, Department of Computer Science, University of Virginia
Healthcare cognitive assistants (HCAs) are intelligent systems or agents that interact with users in a context-
aware and adaptive manner to improve their health outcomes by augmenting their cognitive abilities or
complementing a cognitive impairment. They assist a wide variety of users ranging from patients to their
healthcare providers (e.g., general practitioner, specialist, surgeon) in several situations (e.g., remote patient
monitoring, emergency response, robotic surgery). While HCAs are critical to ensure personalized, scalable,
and ecient healthcare, there exists a knowledge gap in nding the emerging trends, key challenges, design
guidelines, and state-of-the-art technologies suitable for developing HCAs. This survey aims to bridge this
gap for researchers from multiple domains, including but not limited to cyber-physical systems, articial
intelligence, human-computer interaction, robotics, and smart health. It provides a comprehensive deni-
tion of HCAs and outlines a novel, practical categorization of existing HCAs according to their target user
role and the underlying application goals. This survey summarizes and assorts existing HCAs based on their
characteristic features (i.e., interactive, context-aware, and adaptive) and enabling technological aspects (i.e.,
Authors’ addresses: S. M. Preum, Department of Computer Science, Dartmouth College, Hanover, NH, 03755, USA; email:
spreum@dartmouth.edu; S. Munir, Bosch Research and Technology Center, Pittsburgh, PA, 15222, USA; email: sirajum.
munir@us.bosch.com; M. Ma, Department of Computer Science, University of Virginia, Charlottesville, VA, 22904, USA;
email: mm5tk@virginia.edu; M. S. Yasar, Department of Electrical and Computer Engineering, University of Virginia, Char-
lottesville, VA, 22904, USA; email: msy9an@virginia.edu; D. J. Stone, Departments of Anesthesiology and Neurosurgery,
and the Center for Advanced Medical Analytics, University of Virginia School of Medicine and MIT Critical Data, Lab-
oratory for Computational Physiology, Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Tech-
nology, Charlottesville, VA, 22904, USA; email: djs4v@hscmail.mcc.virginia.edu; R. Williams andd H. Alemzadeh, De-
partment of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, 22904, USA; emails: {rdw,
ha4d}@virginia.edu; J. A. Stankovic, Department of Computer Science, University of Virginia, Charlottesville, VA, 22904,
USA; email: jas9f@virginia.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and
the full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be
honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specic permission and/or a fee. Request permissions from permissions@acm.org.
© 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.
0360-0300/2021/02-ART130 $15.00
https://doi.org/10.1145/3419368
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
130:2 S. M. Preum et al.
sensing, actuation, control, and computation). Finally, it identies critical research questions and design rec-
ommendations to accelerate the development of the next generation of cognitive assistants for healthcare.
CCS Concepts: • Information systems →Decision support systems;•Human-centered computing
→Ubiquitous and mobile computing systems and tools;•Computing methodologies →Articial
intelligence;Machine learning;•Computer systems organization →Embedded and cyber-physical
systems;
Additional Key Words and Phrases: Cognitive assistant, agent based systems for healthcare, smart health, in-
telligent agent, intelligent assistant, virtual assistant, virtual agent, personal assistant, healthcare application
ACM Reference format:
Sarah Masud Preum, Sirajum Munir, Meiyi Ma, Mohammad Samin Yasar, David J. Stone, Ronald Williams,
Homa Alemzadeh, and John A. Stankovic. 2021. A Review of Cognitive Assistants for Healthcare: Trends,
Prospects, and Future Directions. ACM Comput. Surv. 53, 6, Article 130 (February 2021), 37 pages.
https://doi.org/10.1145/3419368
1 INTRODUCTION
The rapid digitization of healthcare, along with the advancement in ubiquitous computing tech-
nology, has accelerated the development of assistive technologies for healthcare. These assistive
technologies aim to support dierent user groups in the healthcare domain ranging from patients
to their healthcare providers. Although there are several existing surveys on assistive technologies
for healthcare [4,35,42,51,61,66], only a few of them focus on cognitive assistants [35,51,66].
Most of the existing surveys focus on reviewing the assistive technologies for healthcare from ap-
plication domains rather than pointing out the key technological challenges and future directions
to provide cognitive assistance in healthcare. Thus, there is a knowledge gap in nding the key
challenges and state-of-the-art technologies suitable for developing capable cognitive assistants
for healthcare.
However, cognitive assistant for healthcare is an emerging topic of current and future research.
It poses several interesting challenges that should be addressed to create a signicant impact on
outcomes of individual and population-level health. To bridge this knowledge gap, we provide a
comprehensive survey of the existing research and state-of-the-art healthcare cognitive assistants
(HCAs) in this article. While there are dierent perspectives of assistive technology for healthcare
ranging from neuroscience to robotics, we specically focus on existing research on cognitive
assistants for healthcare from the domains of robotics [1,7,26,47,66,84,93,110,123,135,135,
138,142], articial intelligence [33,109,110,117,124,129], cyber-physical systems [18,32,33,40,
66,78,84,93,103,104,128,129], human-computer interaction [32,33,35,40,66,93,103,104,110],
and smart and connected health [124,129,136,137].
There is no standard denition of healthcare cognitive assistants (HCAs). Our denition of HCA
is inspired by existing denitions of related systems, including general cognitive assistants, intel-
ligent agents, assistive technology for cognition, and healthcare assistants or agents. The relevant
existing denitions can be found in Section 1of the online supplemental materials.1We den e
HCAs as follows: A Cognitive Assistant for healthcare is an interactive, contextual, and adap-
tive system that possesses computational capabilities based on a large amount of data or explicit
models of the environment and provides cognition power to improve health outcome by either aug-
menting human intelligence or providing complementary assistance for cognitive impairment.
Here, an improved health outcome refers to any positive outcome for the physical, mental,
and psychological health or well-being of an individual. The improvement can be achieved
by providing cognitive support to (or augmenting the cognitive ability of) anyone involved in
1There, we also compare HCAs with agents used in robotics and reinforcement learning.
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
A Review of Cognitive Assistants for Healthcare: Trends, Prospects, and Future Directions 130:3
healthcare, i.e., physicians, nurses, patients, in-home caregivers, or emergency responders. Thus,
improving the eciency of healthcare providers or augmenting their cognitive ability can be
one of the goals of HCAs. Also, computational capabilities can be based on natural language
processing, machine learning, computer vision, and reasoning and inference. The environment
refers to the collection of situations, contexts, resources, and users (e.g., a caregiver, patient, or a
human operator) that the cognitive assistant is used in or interacts with. The main contributions
of this survey are the following:
(1) We provide a comprehensive denition of HCAs and identify the characteristic fea-
tures of HCAs that are suitable for the underlying application domains (Section 3). We
also identify the critical aspects of HCAs that are relevant to multiple domains, including
but not limited to robotics, articial intelligence, cyber-physical systems, human-
computer interaction, and smart and connected health.
(2) We review and analyze existing research and state-of-arts of HCAs in terms of these
key features and critical cyber-physical components. We create and consolidate tax-
onomies of HCAs according to these features (Section 3) and cyber-physical components
(Section 4).
(3) We also present the application goals/objectives of HCAs in terms of who they assist
(e.g., patients or their care providers) and the types of assistance they provide (e.g., real-
time decision support, or complement a cognitive impairment) (Section 2). We identify
the potential application requirements for each of these application types.
(4) We provide a set of critical challenges, future research directions, and design guide-
lines for the next generation of intelligent or cognitive healthcare assistants with respect
to current and imminent pervasive technologies (Section 5).
A brief outline of the scope of this article and the range of existing HCAs applications
are presented in Figure 1. It shows the dierent user groups of assistive healthcare applications and
the variety of situations where such applications are used. The dening characteristics (i.e., inter-
active, adaptive, and context-aware) and cyber-physical aspects of HCAs (i.e., sensing, actuation,
and control and computation) are also presented in this gure. We considered a wide array of re-
search on cognitive assistants, including intelligent personal assistants, personal software agents,
assistive robots, virtual assistants, virtual coach for healthcare, personalized assistants, and assis-
tive technology used for healthcare. The goal is to identify relevant existing research even though
dierent research communities use dierent terminologies to describe their works. However, only
the works that at least partially satisfy the proposed denition of HCAs mentioned above are in-
cluded in this survey. A list of acronyms used throughout the article is presented in Table 8in
Section 6.
2 APPLICATIONS OF COGNITIVE ASSISTANT FOR HEALTHCARE
Several existing surveys on healthcare assistants provide taxonomies of healthcare applications
[42,50] and present a detailed review for the surveyed applications. For instance, they cover cat-
egorization based on the intended users, (i.e., patient-centered, sta-centered, healthcare organi-
zation centered) [51], categorization based on the functionality of applications [50,51], or cate-
gorization based on the input and output modalities [61]. While such categorizations provide an
overview of healthcare applications targeted at dierent user roles, they do not highlight the pat-
terns of design requirements and the technical challenges relevant to these applications. Hence,
we categorize HCAs according to user roles, situations, and underlying objectives, as these factors
determine the design requirements of HCAs from dierent application areas. We also characterize
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
130:4 S. M. Preum et al.
Fig. 1. This survey focuses on reviewing healthcare cognitive assistants (HCA). We show the dual user
roles common in healthcare practice: the care recipients (on the le) and the care providers (on the
right). In the blue circle, we list the dierent situations where HCAs are used, e.g., home healthcare, preven-
tative medicine, and diagnostics. Inside the blue circle, we show the defining features and cyber-physical
components of HCAs. The defining features for an HCA are interactive, context-aware, and adaptive
(referring to Section 3). These features are implemented by dierent cyber-physical components, in-
cluding sensing, actuation, control, and computation (referring to Section 4). Referring to our proposed
definition of HCAs presented above, the goal of an HCA is to improve health outcome. This is achieved by ei-
ther augmenting the user’s (i.e., a care provider or a care recipient) intelligence or providing complementary
assistance for a cognitive impairment of a patient.
the essential features and cyber-physical systems aspects of the underlying technology of HCAs as
described in Sections 3and 4. Based on our review, we categorize HCAs in the following classes:
•Patient-facing HCAs to provide pervasive cognitive assistance even in the absence of
professional healthcare providers (refer to Table 1).
•HCAs to provide cognitive assistance to professional healthcare providers for scal-
able, ecient, and eective care delivery (refer to Table 2).
•HCAs used for training patients and professional healthcare providers (refer to
Table 3).
It should be noted that patients and professional healthcare providers mentioned above can
include any of the categories depicted in Figure 1. We list HCAs for training as a separate category,
since it has dierent application requirements than the other two categories as shown in Table 3.
3 FEATURES OF COGNITIVE ASSISTANT FOR HEALTHCARE
Upon reviewing the current research on cognitive assistants and assistive technologies for health-
care [12,27,55,70,85], we have identied three key features that enable an HCA to provide
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
A Review of Cognitive Assistants for Healthcare: Trends, Prospects, and Future Directions 130:5
Table 1. Dierent Types of Applications That Belong to the Set of Patient-facing HCAs
Application
Type Potential Application Requirement Examples
Provide decision
support
Natural interaction, explainable, user
context, empathy
A smartphone-based conversational agent for
self-care of heart failure patients [28].
A desktop-based 3D virtual, empathetic agent
for suggesting intervention to change drinking
behavior [67].
Educate patients Natural interaction, user context,
explainable, multimodal actuation
A smartphone-based 3D conversational virtual
agent for educating individuals with atrial
brillation [9].
Provide
diagnostic
support
Natural interaction, user context,
explainable, multimodal sensing
A desktop-based conversational agent for early
dementia detection based on the standard
protocol of questionnaire [2].
A chatbot for checking symptoms and mapping
them to diseases using medical knowledge
bases [34].
Support activities
of daily living
(ADL)
Temporal, spatial, personal, and situational
context, mobility or ubiquitous,
adaptiveness, context-aware interaction,
energy eciency, embedded processing
A mobile, autonomous robotic assistant for
generating reminders for routine activities,
answering a limited set of questions and
providing guided navigation of the user [93].
Support specic
cognitive
challenges
Personal, situational, and spatial context,
multimodal interface, empathy, embedded
processing
A smartphone-based personalized navigational
guidance system for visually challenged
individuals [3].
Provide
companionship
Personal and situational context, empathy,
emotion, appearance, multimodal
interaction including verbal and nonverbal
interaction, adaptive
An emotive companion robot for the elderly
population that represents a pet cat in terms of
physical appearance [26].
Provide
counseling or
psychotherapy
Personal and situational context, empathy,
emotion, appearance, multimodal
interaction including verbal and nonverbal
interaction, adaptive
A robot-based anxiety management system for
providing personalized therapies to reduce
user’s anxiety level [1].
Provide physical
therapy
Multimodal actuation including haptic
feedback, visualization, adaptive, personal
and situational context
A wearable, physio-therapeutic system for
post-surgery rehabilitation that utilizes haptic
feedback to ensure safe and eective movement
of a target body part or joint based on depth
sensing [104].
Data collection &
self-monitoring
Temporal, spatial, personal, and situational
context, mobility or ubiquitous,
adaptiveness, and energy eciency
Providing support for (i) self-monitoring of
patients or (ii) collection of longitudinal
behavioral data for disease management or
health risk assessment [61][44].
Each application type poses some requirements as shown in the second column. For instance, patient-facing HCAs that
provide decision support to patients should support natural interaction, provide explainable intervention, be aware of
user context, and demonstrate empathy. The third column shows some examples of existing systems that belong to
these application types.
cognitive support eectively. They are context-awareness, interactivity, and adaptiveness. In the
following subsections, we review the existing research in terms of each of these features. Specif-
ically, how existing HCAs implement these features, what are the emerging trends, and what are
the potential challenges that are yet to be addressed. It should be noted that often these fea-
tures are intertwined. For instance, to identify the context of a user, the system would need
to interact with users or their environment. The outline of this section is also summarized in
Figure 2.
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
130:6 S. M. Preum et al.
Table 2. HCAs to Provide Cognitive Assistance to Professional Healthcare Providers
Application Type Potential Application Requirement Examples
Diagnostic decision
support
Deep understanding of diseases and their interpretation in multiple modalities (X-ray,
Ultrasound, CT, MRI, Clinical text); natural language understanding; multimodal
anomaly detection; summaries imaging studies
MedicalSieve, a radiologist cognitive assistant for an image-guided informatics system to lter the essential
clinical information for diagnosis and treatment planning. Integrates data from EHR, pharmacy, labs, hospital
notes, and radiology/cardiology images and videos [58].
Patient screening and
assessment, and visit
documentation
User training for adaptation, real-time sensing and contextual information retrieval,
intelligent visualization, multimodal perception, hands-free interface, domain-
adapted seamless verbal interaction
Multimodal AR-based Cognitive assistant for screening patients, visualize patient records and direct data
acquisition and control. Supports multimodal hands-free, real-time interaction through speech, eye tracking,
head-mounted display, and large VR display [129].
Treatment decision
support for physicians
Multimodal data analysis and perception, natural language processing, natural
language understanding, domain knowledge, evidence based treatment decision,
condence score of recommendation
Integrate information and extract knowledge from relevant guidelines, best practices, and growing published
medical research to suggest personalized treatment options for oncologists based on a patient’s longitudinal
medical records. The treatment options are ranked by level of condence and contain supporting evidence.
Supports 13 types of cancer [137].
ICU monitoring and
analysis
Automatic integration of electronic medical records; integration of temporal, textual,
image, and video data analytics; support multiple programming languages and
reusability
IBM Artemis enables analytic developers within a medical institution to build and run real-time analytics on
large streams of heterogeneous data from ICU patients. Used in neonatal and neurological ICUs [10,58].
Psychotherapy assistant Interface to simulate activities (e.g., driving), mobility and navigation (if applicable);
haptic, audio, olfactory stimuli; customizable stimuli delivery
VR-based assistant for treatment of combat related PTSD. It enables the physicians to personalize the therapy
sessions according to their patients’ individual needs and simulate VR scenarios to introduce and control a
patient’s real-time trigger stimuli (e.g., gunre, vehicle crash, explosions, and insurgent attacks) [111].
Patient interview agent Natural and realistic interface; Verbal and nonverbal interaction; Natural language
understanding; Dialogue management; Contextual knowledge integration
Ellie, virtual human interviewer agent to conduct semi-structured interviews to initiate interactions to enable
automatic assessment of distress indicators (i.e., behaviors associated with PTSD, depression, or anxiety) [23].
Nurse assistant User-friendly interface and appearance, mobile platform, posture stability, intelligent
navigation control, 3D sensing and perception, real-time monitoring and safety
control
A humanoid, mobile robotic nursing assistant for lifting and moving patients and heavy objects inside a hospital
to increase patient and nurse safety and operational eciency [47].
Documentation and
alert assistant for
emergency response
Knowledge of emergency response protocols; speech recognition and transcription in
noisy environment; device and resource constraints;
TraumaTracker, a smartphone-based assistant for accurate documentation of trauma resuscitation and providing
visual alert to rst responders on a display or wearable smart glass device. It uses belief-desire-intention
model-based agent. It enables responders to automatically record dynamic vital signs of a patient, results of
diagnostic procedures conducted at scene, and validate drug dosage and administration details [22,78].
Support responder for
managing large-scale
emergency
Knowledge processing; Knowledge of emergency response protocols; multimodal
perception; autonomous navigation; device and resource constraints; speech
recognition and transcription in noisy environment
Cognitive mobile robotic agents to assist search-and-rescue operations in large-scale emergency situations by
completing tasks instructed by human rescuers autonomously and communicating even in low-bandwidth
setting eciently. It interprets human commands expressed in natural language and gestures, scans incident area
for potential anomalies or hazards, detects objects of interest, and navigates through emergency area [142].
Support responder for
managing emergency
Domain knowledge integration; speech recognition and transcription in noisy
environment; device and resource constraints; natural language understanding and
generation;
A voice based cognitive assistant for real-time, protocol-driven decision support and automatic documentation
for emergency medical service (EMS) providers. This assistant extracts EMS protocol specic information from
the spoken language collected at scene using a headset and uses that information to model and execute EMS
protocols for real-time intervention suggestion to ensure safety of patients [94,98,102,126].
Robotic surgery
assistant
Trocar planning and placement, extracting surgical context and using knowledge of
the present context, patient’s anatomy and surgeon’s prole to assist the surgeon’s
maneuvers; performing monotonous and recurring tasks that require less cognitive
load
ARssist, an augmented reality application to assist the rst assistant (FA) in passing necessary materials to the
main surgeon and help him with trocar planning and placement, manipulate hand-held instruments, and
perform robot docking. It supports real-time 3D rendering of the robotic instruments, hand-held instruments and
endoscope and real-time stereo endoscopy that is congurable to suit the FA’s hand-eye coordination. It uses a
head-mounted display [101].
Telemedicine and
tele-presence assistants
Real-time interaction; degradable service; visualize test results and past medical
history; congurable interface
A smartphone-based application that enables physicians to interact in real-time through video call interface with
patients from rural or remote areas or with mobility issues. Physicians can access patient’s medical records,
schedule same-day appointment, perform initial diagnosis [5].
Each application type poses some requirements as shown in the second column. The third column shows examples of existing systems that belong to these application types.
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
A Review of Cognitive Assistants for Healthcare: Trends, Prospects, and Future Directions 130:7
Table 3. HCAs Used for Training Patients and Professional Healthcare Providers
Application Type Potential Application
Requirement Examples
Training patients:
enhance cognitive
functionalities
Interface to experts and therapists to
congure the games; adaptive to
user’s preference and behavior;
natural interface
A mixed-reality gaming platform that uses a
multi-touch touch- screen tabletop interface for
preserving cognitive functionality of the elderly,
including memory, reasoning, selective attention,
divided attention, and categorization. It supports
single-player and multi- player games and
personalized content for each player [32].
Training patients:
cognitive orthotics
Real-time, autonomous, mobile
perception; longitudinal assessment of
patient’s condition; episodic memory
retrieval
A mixed-reality training platform and storyboard to
help people with dementia and declining memory to
interact through pen gestures, eye tracker, video
camera, microphone, and bio-sensors. It provides real-
time contextual suggestions to perform instrumental
activities of daily living, send reminders on what to do
next and how to do it and relates this to active memory
training [128].
Training rst
responders
Realistic simulation of dynamic
environment of large-scale emergency
events; responsive and natural
interaction; assessment mechanism
for performance evaluation [41,64,
65]
Training emergency responders to nd the optimal
resource allocation to complete dierent tasks in a
distributed search and rescue mission using a
mixed-reality location-based game [106]. The game
simulates real-world disaster response scenarios and
enables human-agent collaboration.
Virtual patients for
physicians
Realistic user interface, facial
expression, emotion; accurate
response to pain and treatment;
recongurable to critical use cases
Pediatric Hal, a wireless and tetherless pediatric
patient simulator, simulates lifelike emotions (e.g.,
anger, ongoing pain, crying, anxious, yawning)
through dynamic facial expressions, movement, and
speech. It supports providers of all levels to develop
skills needed to diagnose, communicate, and treat
patients in many clinical areas. The simulator supports
(i) real patient monitoring, including SpO2, EKG,
capnography, debrillation, and (ii) emergency
interventions, including surgical airway, needle
decompression, and chest tube [121].
Training surgeons:
robotic surgery
Realistic simulation of the patient’s
anatomy, allowing surgeons to
practice on a particular curricula of
tasks; providing evaluation metrics
with respective scores after task
completion
Virtual simulation–based training such as SimNow
[49] and dv- Trainer [76] have provided objective and
scalable methods for evaluating surgeon’s skills and
improving their training. The simulation platforms
allow surgeons to familiarize with the surgical robot
and improve their hand-eye coordination by
maneuvering the manipulators as well as the
endoscopic camera when performing tasks based on a
specied curricula. The physics engine allows for
realistic tool-tissue interaction that ranges from
burning tissues, bleeding, and cutting.
Each of these application types poses some requirements as shown in the second column. The third column shows exam-
ples of existing systems that belong to these application types.
3.1 Interactivity
One of the fundamental features of a CA is to interact with users. Also, the CA can interact with the
physical environment, services, processors, devices, and other CAs. The goal of such an interaction
is to (i) sense the user goal, intent, or requirement, (ii) resolve ambiguity and incompleteness, and
(iii) provide cognitive assistance. An HCA can interact through adaptive multimodal interfaces
and visualization techniques. In this section, we discuss several major aspects of interactivity in
existing HCAs that shape the design of the underlying systems, including the entities that existing
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
130:8 S. M. Preum et al.
Fig. 2. Key features of healthcare cognitive assistants (HCAs): (i) interactive, (ii) context-aware, and
(iii) adaptiveness. This figure also summarizes the outline of Section 3.
HCAs interact with, modes of interaction, realisticity of interaction, and nature of the interaction
(i.e., proactive or reactive interaction).
3.1.1 Entity of Interaction. Most HCAs directly interact with only the target user. For instance,
an HCA for robotic surgery interacts with the rst assistant of the human surgeon [101]. However,
based on the design requirement, often HCAs can interact with multiple users. For example, RoNA
[47] is a humanoid, mobile robotic nursing assistant for lifting and moving patients and heavy ob-
jects inside a hospital to increase patient and nurse safety and operational eciency. In addition
to a nurse or physician (i.e., who needs assistance to move a patient), it interacts with a telep-
resence operator through a visual interface where the operator can see and control movements
to ensure safe operation. Similarly, often HCAs for ADL support interaction with the patient and
their primary caregiver [66,93] or professional healthcare provider [104].
The design issues with multi-user interaction include, but are not limited to, data ow between
multiple users, maintaining the privacy of each user, providing personalized and consistent inter-
ventions/feedback to each user. In addition to human users, a single HCA can interact with other
HCAs or assistive services or devices that operate in the same environment. For example, an HCA
for ADL support can interact with the personal assistant of the corresponding user (e.g., Alexa or
Google Home) for weather updates and schedule the user’s daily routine accordingly. The potential
challenges of such interactions are discussed in Section 5.
3.1.2 Mode of Interaction. Based on our literature review, the modes of interaction for HCAs
can be primarily categorized as: verbal and nonverbal. Verbal mode of interaction includes tex-
tual [5,77,137], audio [2,71,77], and video [23]. Such interactions may take multiple rounds of
information exchange to understand user intent or requirements, resolve ambiguity, and address
incompleteness. Nonverbal interaction includes interaction through a haptic interface [48,62,
63,92,104], visual interface [33,47], augmented reality [18,45,69,79], virtual reality [23,
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
A Review of Cognitive Assistants for Healthcare: Trends, Prospects, and Future Directions 130:9
Table 4. Examples of Modes of Interaction as Found in Existing HCAs: the first
column lists the modes of interaction
Mode of
interaction Goal/objective Example HCAs
Ver bal
Identify the intent and context of the user
through natural language understanding,
context detection, and dialogue
management [2,23,77,85,108].
Ellie conducts virtual interviews with humans to
automatically assess distress indicators [23]. The
indicators are behaviors associated with
depression, post-traumatic stress disorder, or
anxiety. It uses four classiers for natural language
understanding, context detection, and response
generation.
Message Provide reminder, intervention, or alerts
and ask questions through short messages
on a laptop, smartphone or other smart
display interface [66,71,83,117,128].
EMMA is a smartphone-based virtual personal
assistant [33] that interacts with the user through
recurring ecological momentary assessments for
tracking their level of energy, positivity, and
overall well-being.
A navigational assistant for visually impaired
people provides auditory feedback to alert a user
about the size of the object (obstacle) and the
object’s distance from the user [117].
Tactile and
Kinesthetic
Deliver kinesthetic, or tactile, feedback in a
smart wearable device or sensor-embedded
object [64,84,104,133].
GuideCane [133] supports visually impaired
individuals through haptic feedback to their cane
to indicate their next step while walking (e.g., go
straight, stop). The user interacts through a
thumb-operated joystick to control direction.
Expression and
Emotion
Interact with users by showing emotions or
expressions expressed through a 2D or 3D
virtual avatar or robot [9,67,110].
EmIR [110] is an empathetic social robot to provide
cognitive support to the elderly for activities of
daily life. It displays seven emotions to generate an
empathetic response to users: angry, afraid,
disgusted, happy, neutral, sad, and surprised.
The second column contains goals of each mode of interactions. The third column demonstrates an example of the
corresponding mode of interaction as found in existing HCAs. For instance, the rst row demonstrates the goals and an
example of verbal interaction. Rows 2–4 show dierent types of nonverbal interaction. In addition to such individual
modes of interaction, many HCAs support multimodal interaction combining more than one form of verbal or nonverbal
interaction.
111,112,134], mixed reality [128], or through sensor embedded objects [32,132,133]. Some
HCAs also include olfactory interaction to enhance simulation of a situation or to trigger par-
ticular memory [111]. A lot of HCAs perform multi-modal interaction and often combine both
verbal and nonverbal interactions [9,23,47,83,110,111,134]. Often virtual interaction is carried
out through a 2D or 3D avatar [23,79]. Examples of dierent modes of interactions are presented
in Table 4. Additional details about the sensors and actuators used in dierent modalities of inter-
action are discussed in Section 4. More examples of dierent modes of interaction are available in
Section 2 of online supplementary materials.
3.1.3 Natural Interaction. It is desired that the HCAs interactions with their intended users is
natural and meaning based. Natural interaction can enhance the user acceptance and usability
of HCAs. Several examples of natural interaction as found in existing HCAs are presented below.
•Complex daily activities: Kognichef [83], a cognitive assistant for complex cooking activi-
ties, provides a hands-free interface for browsing a recipe when the user’s hands are occu-
pied (detected through a built-in camera) to ensure natural and eective interaction. The
user can pause, alter, and skip steps of a recipe and thus always stays in charge.
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
130:10 S. M. Preum et al.
•Physiotherapy: KinoHaptics [104], a cognitive assistant for physical therapy and rehabil-
itation, provides haptic feedback when a user performs an unsafe movement during an
in-home physical therapy session. Because haptic feedback requires less attention and focus
from the user than audio or visual feedback, patients can better focus on their physical
therapy. It also provides intuitive feedback by showing a progress bar and a real-time
animation of movement. Thus, it is more engaging and easy-to-interpret as reported by
the users who participated in a user-study to evaluate the usability of KinoHaptics.
•Navigational assistance: An emerging trend among cognitive assistants for visually im-
paired individuals is providing auditory [37,71,117]orhaptic[63] feedback for naviga-
tion or performing an activity. Another navigational assistant for visual impairment uses a
smartphone application paired with a mobile robot [84]. This mobile navigational assistant
robot provides a natural interface in the smartphone app. Specically, each sub-window in
a phone screen is mapped to a predened destination in the corresponding indoor environ-
ment of the user (e.g., cafeteria, rest-room). As a user nds sub-windows on a phone and
then taps them, the app sends a command to the mobile robot to go that destination.
•Enhance cognitive ability through games: A mixed reality (MR)-based HCA for training el-
derly individuals through interactive games [32] provides natural interaction through table-
top MR platform. Tabletop interfaces mainly use touchscreens and multi-touch technolo-
gies. They do not require using a mouse or a touchpad.
However, several existing HCAs lack natural interaction. For example, consider the rehabilita-
tion HCA for post-stroke hand rehabilitation using spatial augmented reality [45] that simulates
common hand movements using AR, including reaching, wrist-tilting, pointing, and grasping [45].
It helps a patient practice these hand movements. However, it is not clear how the assistant re-
sponds if the user makes any hand movement other than the supported movements. It can result
in an interrupted and unsafe user experience. Sainarayanan et al. [117] present a blind navigation
assistant that uses sonication (recognition of an object from sound). However, the user requires
a signicant amount of training before using the system to interpret the feedback from the system
that comes through head-mounted gear. Also, the system does not deal with moving objects. Some
HCAs provide snooze option for notication [103]. Such systems should ensure that snooze option
does not annoy users or cause any discomfort.
3.1.4 Proactive or Reactive Interaction. Most of the HCAs are reactive in terms of initiating
the interaction. The proactive assistive services are often referred to as “detect-assistant” and use
a two-step approach [104,105,122,128]. First, the assistant detects the decit observed in an
abnormal behavior or activity and then it proposes suitable assistance [122]. Here, we present a few
examples of HCAs that support pro-active interaction. Kinohaptics, an HCA for physical therapy
and post-injury rehabilitation, supports proactive interaction. It tracks movement and alerts users
as soon as unsafe movement is detected [104]. In the surgical robot assistant da Vinci, proactive
monitoring is implemented to prevent device malfunctions [105] from aecting the outcome of
safety-critical events. Kognit provides proactive feedback for instrumental activities of daily living
to remind the users of an activity and alert them if they make any mistake while doing an activity
[128]. FindIt provides proactive reminders to its users if they leave behind any critical device,
including phone or keys, while going outside [20]. Another cognitive assistant for monitoring and
detecting early dementia initiates diagnostic conversations with the user proactively [2].
Existing HCAs vary drastically in terms of (i) entities of interaction, (ii) modes of interaction, and
(ii) the nature of the interaction (i.e., proactive or reactive interaction) as discussed above. Although
natural interaction is critical for usability and eectiveness of HCAs [13,26,50,51,61], several
HCAs overlook this aspect or demonstrate low degree of natural interaction [45,103,117]. The
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
A Review of Cognitive Assistants for Healthcare: Trends, Prospects, and Future Directions 130:11
next generation HCAs should consider this issue from the design level and take an interdisciplinary
approach to address it. We also cover the challenges regarding interaction among multiple HCAs
in Section 5.2. Another relevant area of research emerging from the human-computer interaction
research community is how users (i.e., patients, individuals, caregivers, and professional healthcare
providers) interact with HCAs [13,32,33,35,40,77,93,104,110,127].
3.2 Context-awareness
Context-awareness refers to the feature of a system that allows the system to react dierently ac-
cording to dierent contexts. The system usually has some underlying representation of contexts,
and it learns the context automatically, semi-automatically, or from user feedback. A healthcare
CA should be able to understand, identify, and extract contextual elements from the inter-
action with a user and environment. The set of contexts for HCAs can be categorized into four
classes: temporal, spatial, the user or personal, and situational contexts [86].
HCAs are often designed to be temporally context-aware, i.e., they identify and respond ac-
cording to time of the day, day of the week, or some other predened time slots. HCAs often
provide location-aware interventions and use spatial context for inferring user state. For HCAs,
the user context includes a user’s physiological, psychological, behavioral, and medical context.
HCAs can provide interventions that are customized to one or more user contexts.
•Physiological context refers to a user’s age, height, weight, and other physiological factors.
•Psychological context refers to a user’s emotion, mood, personality, level of positivity, and
other psychological factors.
•Behavioral context encompasses a user’s behavior, action, predened priority or prefer-
ences, level of skills, and professional training and certication.2
•Medical context refers to a user’s past medical history, present medical condition, symp-
toms, diagnosis, medications, genetic prole, family history, and similar medical factors.
The situational context includes environmental context, process context, and operational con-
text. Dierent situations for healthcare are presented in Figure 1. For instance, this includes home
healthcare, remote monitoring, ICU, surgery, telemedicine, or emergency response. In addition to
temporal, spatial, and user contexts, often HCAs are aware of situational context in terms of ongo-
ing process, operations, or pre-dened protocols. Table 5presents examples of context-awareness
of existing HCAs. Additional review of context-awareness of existing HCAs can be found in Sec-
tion 3 of online supplementary materials.
Identifying and considering a user’s context are essential for HCAs. However, it also raises
concern regarding privacy, security, safety, and condentiality. Researchers and developers should
address these challenges while designing and developing HCAs.
3.3 Adaptiveness
An adaptive system3refers to a system that changes its behavior in response to its environment.
A healthcare cognitive assistant should be adaptive so it can accommodate the dynamic behavior
2For professional care providers, including physicians, nurses, and emergency responders.
3It should be noted that context-awareness and adaptiveness are often used interchangeably in some of the existing litera-
ture. However, we distinguish between these features as follows: A system can react dierently under the same context to
provide a more adaptive response. To illustrate the point, a context-aware cognitive assistant for ADL support can generate
specic reminders for an activity based on spatial, temporal, or situational context, e.g., suggesting outdoor exercise when
the weather is nice, and the user is physically inactive for a long period [103]. However, the system can be adaptive if it
adapts its reminder based on the user’s response, e.g., when the user declines the reminder to perform outdoor exercise
repeatedly, the system adapts to the user behavior and suggests an alternative exercise.
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
130:12 S. M. Preum et al.
Table 5. Examples of Context-awareness of Existing HCAs: The first and second columns contain
the names of dierent types of contexts and their sub-categories, respectively
Name of
Context Type of Co nt ext Example HCAs
Tempo ral Duration
Step Up Life, generates notications and reminders when the user is
physically inactive for a predened period. The user can also set “no
reminder” timeslots so the assistant does not generate any exercise
reminder even if there is a long span of inactivity [103].
Time of the day/day
of the week
EMMA, an empathetic virtual assistant for well-being, uses temporal
context (time of the day and the day of the week), in addition to other
contexts, to infer mood [33].
Spatial
Inside/ outside
Gabriel, a Google Glass–based wearable assistant for individuals with
cognitive decline, performs location-aware sensor control [40]and
user-activity recognition. Such as, if the user falls asleep at home, it
turns o the built-in camera to save battery life. It awakens users if
they fall asleep while traveling in public transport.
Exact and relative
location
EMMA uses spatial context to infer user’s mood [33]. It uses the
user’s exact and relative location (e.g., distance from work/home) as
spatial features to predict the user’s mood.
Landmarks
A navigational assistant for people with cognitive impairments uses
augmented reality [43]. Instead of street names/distance, it focuses
on user-friendly routes based on user-known landmarks.
User
Physiological
Quro, a conversational assistant to support symptom checking by
patients, is aware of a subset of user’s physiological and medical
contexts, e.g., gender, age, smoking history, and heart problems [34].
Psychological EMMA recommends activities according to the user’s mood to
promote the emotional well-being of the user [33].
Behavioral
Ellie, a virtual human interview agent, generates appropriate real-
time nonverbal interaction/behavior based on the conversational
context and user’s facial expression and gestures [23], e.g., facial
expression, eye and head movement, blink, and gaze.
Medical Babylon, a chatbot to support self-diagnosis for patients, generates its
response according to the user’s medical context [5].
Multiple
ODVIC, a multimodal conversational assistant to deliver evidence-
based interventions for behavior change, provides interventions that
are customized to the user’s prole, past behavior, mood, and recent
conversation [67]. Thus, it combines psychological, physiological, and
behavioral contexts to provide customized interventions.
Situational
Environmental A navigational assistant that supports path planning is aware of
environmental and spatial context [88].
Process A surgical assistant monitors the current surgical task of a surgical
procedure [141] and conducts context-specic safety checks.
Operational
CognitiveEMS, a cognitive assistant for emergency response, suggests
interventions to EMS responders that are specic to the context of
standard EMS protocols [126].
For instance, user context can be physiological, psychological, behavioral, medical, or a combination of any of
these user contexts. The third column presents examples of each of these dierent types of context.
of its environment (including the physical environment or ambiance of the system) and the user’s
goals, needs, requirements, actions, and behaviors. While generating dynamic response is one of
the essential characteristics of adaptive systems, additional characteristics include, but are not lim-
ited to, (i) resolving ambiguity , (ii) tolerating unpredictability, and (iii) learning from experience.
The degree of adaptiveness can vary across systems. Based on our review of existing HCAs, there
are four dimensions of adaptiveness.
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
A Review of Cognitive Assistants for Healthcare: Trends, Prospects, and Future Directions 130:13
Table 6. Examples of Adaptiveness of Existing HCAs: The first and second columns contain the
dimension and degrees of adaptiveness, respectively
Dimension of
adaptiveness Degree of adaptiveness Example HCAs
User’s action Velocity of movement
Pearl, a mobile robotic assistant acts as a navigational guide
and ADL assistant for the elderly and adapts its velocity
according to the user’s velocity [93]. Pearl estimates the user’s
velocity and adjusts its speed accordingly.
User’s behavior
Interventions performed
by emergency medical
services (EMS)
responders
CognitiveEMS is a cognitive assistant to provide real-time
decision support to EMS responders. It monitors a patient’s
condition and user’s actions (i.e., the actions performed by an
EMS responder to manage the emergency scene) and suggests
intervention according to EMS protocols [94,98,99,126].
User’s need Errors made by the user
A cognitive assistant to support visually challenged people for
meal preparation provides vocal instructions as the user
proceeds with preparing their meal [36]. It suggests adaptive
interventions that are customized to the types of error the
user most frequently makes (e.g., initiation, planning,
attention, and memory decit).
Environment Network failure and
computational resources
Gabriel, a Google Glass-based wearable assistant for ADL
support is adaptive to network failures and unavailability of
remote tiers [40]. It performs computation on server hardware
when the network is available to save the device energy and
increase the service speed. In case of network failure, the
system ooads computation to a fallback device e.g., the
user’s smartphone.
The third column presents example of such adaptiveness. For instance, the rst row describes Pearl, a mobile
robotic assistant that adapts its mobility according to the user’s action, i.e., their velocity.
•User’s action: HCAs can be adaptive to users’ actions, i.e., the system monitors the user’s
activities and adjusts intervention suggestions accordingly [8,18,83,93,94,98,99,126].
•User’s behavior: HCAs are often designed to be adaptive to users’ verbal and nonverbal
behaviors [23,26,88,93].
•User’s need: This refers to the feature of an HCA where the HCA adapts its response to a
user’s (e.g., patients, caregivers) cognitive need, as the user’s condition (e.g., disease, stress
level, psychological state) changes over time [8,36,104,111].
•Environment: HCAs are designed to respond dynamically with the change in the surround-
ing environment. For example, many HCAs developed for navigational assistance are adap-
tive to the surrounding environment [69,84,133], e.g., obstacles, visibility, and illumination.
Table 6demonstrates examples of dierent dimensions and degrees of adaptiveness as found in
existing HCAs. Additional review of adaptiveness in existing HCAs can be found in Section 4 of
online supplementary materials.
3.4 Limiting Features of Existing HCAs
•Context-aware and adaptive assistance require memory:The assistants need to re-
member past interactions and then respond according to the specic application in the cur-
rent circumstances. Besides, the assistant should be able to distinguish anomalies from a
new behavior pattern to provide accurate and satisfactory cognitive assistance. However,
these two issues are overlooked in most existing HCAs.
•Uncertainty and ambiguity: HCAs should be adaptive to ambiguity and uncertainty, i.e.,
they should be able to “resolve ambiguity and tolerate unpredictability” [85]. They should
dene a problem specically by asking additional questions or utilizing additional input
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
130:14 S. M. Preum et al.
Fig. 3. Cyber-physical components of healthcare cognitive assistants (HCAs): (i) sensing, (ii) actuation, and
(iii) control and computation. HCAs sense overall user’s state (e.g., user’s need, behavior, context) and actuate
on the user to perform intervention. Optionally, some HCAs might sense and actuate on the surrounding
environment of the user. The surrounding environment could be smart, e.g., smart home, virtual environment,
or augmented reality–based environment.
sources to resolve ambiguity and incompleteness. Although the spectrum of resolving ambi-
guity and unpredictability can vary across dierent applications, the majority of the current
HCAs do not address ambiguity and uncertainty.
•Preserving privacy, condentiality, and security: Most of the HCAs we reviewed over-
look privacy, condentiality, or security issues. However, one of the most signicant dier-
ences of HCA comparing to other cyber-physical systems is the actuator of the system, i.e.,
human beings. As we discussed in previous sections, HCAs are context-aware and adaptive,
which, at the same time, also means that they require more personable information from
humans and take actions on humans. For example, a smart reminder system gives reminders
to patients based on their daily living habits, which also may detect a person’s other activ-
ities; a navigation system for visual impairment people knows where and when they go
most of the time. Protecting this private information from leaking, or malicious usage, is a
signicant challenge for the HCAs. Also, malicious attacks can result in adverse or even fa-
tal outcomes for users (error in robotic surgery, wrong or unsafe medication dosage) [139].
Medical devices and HCAs might be hacked and can cause harm or demand ransomware
[29].
4 PHYSICAL AND CYBER COMPONENTS FOR COGNITIVE ASSISTANCE IN
HEALTHCARE
In this section, we review the cyber-physical components (CPS) of HCAs, including (i) sensing
modality (i.e., detection/perception), (ii) actuation (i.e., response and intervention), and (iii) control
and computation. These CPS components are essential for HCAs, as they enable the key features of
HCAs. For instance, the proper sensing and actuation modules enable an HCA to be interactive;or
through sensing and control/computation an HCA can detect and identify dierent contexts. The
interaction between a user and an HCA through dierent CPS components is shown in Figure 3.
It also shows that often an HCA can also interact with the user’s surrounding environment (e.g.,
activity recognition in a smart home, sensing and actuating on a user’s virtual environment).
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
A Review of Cognitive Assistants for Healthcare: Trends, Prospects, and Future Directions 130:15
4.1 Sensing Modality: Detection and Perception
This section presents an overview of the sensing and perception technologies used in existing
HCAs. The sensors can be roughly categorized into six classes:
(1) Primitive sensors: These include, but are not limited to, PIR motion detectors, temperature
sensors, contact sensors, light sensors, and humidity sensors.
(2) Physiological sensors: While the primitive sensors obtain the environmental states, phys-
iological sensors (e.g., pulse oximeter, blood glucose monitor, heart rate sensors, EKG,
blood pressure, and skin conductance sensors) are applied to measure the patients’ physi-
ological states. Physiological sensors that are smaller in size and provide accurate sensing,
wireless communication, and user-friendly interface are more suitable for usage in HCAs.
(3) Acoustic and Ultrasonic Sensors: These are often used for environmental sensing and
obstacle detection to support navigation of visually challenged individuals. In addition,
several conversational HCAs use acoustic sensors (e.g., microphone array or a built-in
microphone in smart devices) to recognize user’s speech to detect emotions or semantics
of their speech.
(4) RGB Camera: These are often used in HCAs for (i) sensing the surrounding of a visually
challenged individual (e.g., obstacles, object detection, people identication, localization)
and support navigational assistance, (ii) detecting nonverbal interaction (e.g., facial ex-
pression, gaze, emotion, empathy), and (iii) ne-grained activity recognition.
(5) RGB-D and Depth sensors: Depth sensors provide a more privacy-preserving approach
for detecting objects and sensing the surroundings of an individual. Thus, they are an
alternative to RGB cameras for navigational assistance, localization, and object detection.
In addition, HCAs often use depth sensors for tracking body gesture and pose to support
real-time monitoring of psycho-motor exercises and physiotherapy sessions.
(6) GPS and Bluetooth low energy (BLE) beacons: These sensors are used in mobile HCAs to
support people or object tracking, localization, and navigational assistance.
Table 7demonstrates examples of usage of these six classes of sensors in existing HCAs. Addi-
tional examples of dierent sensing techniques used in existing HCAs are available in Section 5 of
online supplementary materials. We also review usage of multimodal sensing in existing HCAs.
4.1.1 Multimodal Sensing. Several HCAs rely on multimodal sensing to provide multimodal
interaction [83,93], assist users in multiple cognitive functions [93,135], act as a robotic surgery
assistant [123,138], and support augmented reality (AR), virtual reality (VR), or mixed reality
(MR) interfaces [109]. For instance, the relative location and motion of the user’s head needs to
be determined to accurately adjust the projected image or holographs for headset-based VR or AR
applications. It is achieved using an Inertial Measurement Unit (IMU) that combines an accelerom-
eter, a gyroscope, and a magnetometer. By combining the relative positions information from the
three sensors, the user’s head position and movement are accurately tracked.
For navigational assistance. Ribeiro et al. [109] propose an auditory augmented reality system,
where the system integrates acoustic virtual objects into the real-world to assist people with visual
impairment. The goal is to allow the innate ability of individuals of sound source identication and
source separation to determine nearby objects. The subject wears a helmet that is instrumented
with an RGB-camera, an IMU, and a headphone. A 3D gyroscope is used to track the head, and a
3D accelerometer is used to infer the oor plane by estimating the gravity vector. RGB-D stream
is used to infer high-level features of interest (face detection and recognition, oor mapping for
navigation, and plane detection). The detected high-level features are conveyed to the user by
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
130:16 S. M. Preum et al.
Table 7. Examples of Six Classes of Sensors as Found in Existing HCAs: the first column lists the classes
of sensors. The second column contains the set of relevant tasks of each class of sensors. The third
column demonstrates an example usage of relevant sensors in existing HCAs. In addition to individual
modes of interaction, many HCAs support multimodal sensing as discussed in Section 4.1.1.
Types of sensors Relevant tasks/usage Example HCA
Primitive
Occupancy detection, event detection,
activity recognition and monitoring [8,
36,110]
A robot assistant uses environmental sensors such as
CO2, humidity, temperature, propane and butane to
trigger alarms for potentially risky events, e.g.,
leakage of propane gas, high level of CO2[110].
Physiological Sense and measure physiological state,
e.g., heart rate, blood pressure, blood
glucose [9,128,130,134]
A smartphone-based conversational assistant to
promote self-care in people with atrial brillation
(AF) [9], uses AliveCor Kardia mobile heart rhythm
monitor, a sensor- monitor validated for AF
screening. The device is attach- ed to a smartphone
and transmits data via Bluetooth.
A virtual coach to improve exercise performance is
proposed in Reference [134] that relies on a VR
bike-frame and several physiological sensors to
capture the user’s brain activity and other vitals
while using the bike. It uses Electroencephalogram to
capture brain activity. It also captures heart rate,
respiration rate, bike pedal rate, and power exerted
by the user on the bike.
Acoustic and
ultrasound
Speech recognition [9,23,83,110,111],
obstacle detection for blind navigation
[95,125,133], surrounding environment
sensing [54]
GuideCane equips a cane with ultrasonic sensors for
obstacle detection and to help visually challenged
people to go around the obstacles [133].
RGB camera Navigation [69,84,117], AD support
[18,75,135], and nonverbal interaction
[20,110]
A mobile robot for visually challenged people uses
on- board camera to navigate by tracking pre-
deployed markers (or stickers) on a oor [84].
A robot is equipped with a camera to capture facial
images that are used for people identication and
emotion classication [110].
RGB-D and depth
sensors
Gesture and pose detection for psycho-
motor exercise and physiotherapy [11,
16,17,39,57,60,79]
A cognitive assistant for remote physiotherapy
monitors the patient’s exercise session through
Kinect [104] at home. It keeps track of movements of
selected joints to provide haptic feedback through
the patient’s armband.
GPS Navigation, tracking, and localization
[3,30,71,88,103,118]
Step Up Life, uses smartphone’s GPS sensor, CELL
ID, and Wi-Fi details for tracking user’s location
[103]. It tracks a user’s activities using the phone’s
accelerometer and magnetometer to generate
exercise reminders.
using pre-recorded wave les, a text to speech synthesizer after spatializing each sound. Lee et al.
[63] mount an RGB-D sensor and IMU sensor into a pair of glasses instead of mounting them above
ahelmet[109] to build a navigational assistant. A smartphone is used to specify the destination.
Supporting cognitive decline. Vorobieva et al. develop a robotic system to assist people who are
losing their autonomy, e.g., disabled, elderly [135]. The system has a gripper with a stereo camera
(for visual servoing or vision-based robot control), pressure sensors, and optical barrier to detect
when an object is in the gripper. The user can request the robot to nd an object from a predened
list of objects. The robot then navigates through the environment to pick up the object and bring
it to the user. The goal of the system is to stimulate the cognitive state of the user by playing
games.
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
A Review of Cognitive Assistants for Healthcare: Trends, Prospects, and Future Directions 130:17
Supporting multiple functionalities and complex activities. Pollack et al. develop a robotic as-
sistant for cognitive orthotic functions (i.e., providing context-aware and adaptive reminders for
activities of daily living) and safe navigation for the elderly [66,93]. The system utilizes SICK laser
range nders and sonar sensors for navigation, microphones for user’s speech recognition, and
touchscreen display to detect user needs. It utilizes a camera data stream for face detection, activ-
ity recognition, and object tracking and detection for navigation support. It deploys multimodal
sensing for navigation by combining sensor data streams corresponding to user localization, object
detection, and tracking. KogniChef [83] is a cognitive cooking assistant for preparing a meal. It
uses a Kinect RGB-D sensor and a thermal camera to perform object detection, tracking, and grasp
detection. A scale is used in addition to cameras to estimate ll-level for pouring ingredients. A
microphone array is used for speech recognition, and a speaker is used to provide feedback.
For robotic surgery. Shademan et al. present “Smart Tissue Autonomous Robot (STAR)” for au-
tomating soft tissue surgical tasks and providing a collaborative platform for decision-making and
execution of surgical tasks to surgeons [123]. The STAR system utilizes 3D plenoptic vision, near-
infrared uorescent (NIRF) imaging, sub-millimeter positioning, actuated surgical tools, and force
sensing to construct and execute surgical tasks. The combination of “NIRF technology and 3D
quantitative plenoptic imaging” addresses the problems of occlusion and target tissue recognition
by observing “luminescent NIRF markers” [123].
4.2 Actuation Modality: Response and Interventions
Based on the type of tasks digital assistants perform, they can be categorized into three classes [70]:
(i) personal assistant or butler that performs a task on behalf of the user, (ii) cognitive orthotic that
provides adaptive and contextual feedback and reminders to people with cognitive impairment or
decline, and (iii) mentor or coach. Based on our review, we found that most of the current HCAs
mostly fall in the second category. However, this categorization does not consider the set of HCAs
that enhances cognitive functionalities (e.g., HCAs for training healthcare providers or providing
decision support to them). So, instead of following the taxonomy mentioned above, we present the
dierent modalities of actuation in existing HCAs in this section.
4.2.1 Visual. A dashboard or display-based system is one of the most common forms of ac-
tuation and often provides visual guidance to users to perform a task properly. Pearl utilizes a
touch-sensitive graphical display for ADL reminder and navigational instructions [93]. KogniChef
[93], an ADL assistant specically designed for complex cooking tasks, uses a display to inform a
user about the current state of cooking through structured visual information to reduce cognitive
load. A physical rehabilitation HCA provides visual feedback on a user’s specic physical move-
ment during a physiotherapy session [104] to enable the user to visualize their movements. The
STAR system [123] provides suture automation software that displays a geometrically optimized
suture plan in real-time. If the placement of the suturing tool is problematic, the surgeon has the
option of intervening and making adjustments.
Visual actuation often consists of contextual and adaptive textual interventions. EmIR [110], a
cognitive assistant for emotional well-being uses textual messages to provide contextual reminders
and recommendations and persuade the users in activities to lift their emotional state. Another
cognitive orthotic HCA developed for meal preparation alerts the user when a missing or wrong
step is detected [8]. It sends the message with instructions using not only text, but also gures to
better explain the missing/wrong cooking steps to users.
4.2.2 Audio. Conversational HCAs often implement verbal communication through audio [2,
9,23,83,110,111] with an underlying text-to-speech conversion and transcription module, e.g.,
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
130:18 S. M. Preum et al.
Google speech, IBM, or CMU sphinx. In addition to such verbal communication, audio mode is
often used for nonverbal interaction [3,69,88,109,118] and cognitive orthotics [83,93]. The
most common form of such actuation is found in navigational assistants where the navigation
system gives the user step-by-step instructions using earphones [3,69,88,118]. As an example of
cognitive orthotics, Pearl [93] utilizes built-in speakers for speech synthesis to answer user queries
and provide ADL reminders.
4.2.3 Haptic. Haptic feedback can be kinesthetic or tactile or a combination of both. Kines-
thetic feedback refers to the haptic sensation that is felt by the muscles, joints, or tendons. Usually,
kinesthetic actuation includes weight and stretch. However, tactile actuation refers to the haptic
sensation felt by the surface of our body and includes vibration, pressure, and texture. Most of the
HCAs reviewed in this article that use haptic feedback use tactile feedback. Haptic feedback can
address the issue of accessibility to some extent, since it can be more desired than audio and visual
feedback for people with declining auditory and visual perception, respectively. It is also useful for
implementing a hands-free interface, since the user may be engaged in some activity (e.g., physical
exercise or therapy) and cannot hold any device.
Khademi et al. develop an augmented reality rehabilitation system that uses haptic feedback to
enable patients with stroke to practice their hand and arm movements without the presence of a
physical therapist [56]. Phamduy et al. develop a novel belt to provide tactile stimulation in the
abdomen for situational awareness and obstacle avoidance by integrating micro ber composites
into the belt [92]. Nguyen et al. build a way-nding system, which is deployed on a mobile robot
that a user would follow to navigate [84]. The user would use a smartphone to select a destina-
tion from a predened set of destinations. While a user follows the robot, the feedback from the
robot is encoded as tactile vibrations of the smartphone to notify the user. There are four types
of vibrations to suggest “turn left,” “turn right,” “go straight,” and “stop.” The navigational assis-
tants presented in References [62,63] use tactile feedback through a vest. There are four vibration
motors integrated into a vest that is controlled wirelessly to provide four navigation cues: straight
(“no tactile sensors on”), stop and scan (“all tactile sensors on”), turn left (“top-left sensor on”), and
turn right (“top-right sensor on”). The authors argue that the vest-type interface would reduce the
cognitive burden of the user compared to audio-based navigation feedback.
KinoHaptics, an HCA for self-care and post-surgery rehabilitation, monitors the patient’s phys-
iotherapy session and provides haptic feedback through the patient’s armband [104] in real-time.
The vibro-haptic feedback is generated to make sure the user does not overdo or under-perform
an exercise suggested for physiotherapy. The armband contains an array of vibration motors, and
it connects to the feedback-generating server machine via Bluetooth connection. Step Up Life
uses haptic feedback for cognitive orthotics, specically for physical activity and movement. If
it observes prolonged inactivity of a user, it noties the user along with an exercise suggestion
by generating haptic vibrations using the cell phone vibration motor. The duration of the haptic
vibration depends on the number of times the user has snoozed a notication [103].
4.2.4 Multimodal Actuation. Several existing HCAs perform multimodal actuation to support
multiple cognitive functionalities and to provide natural, realistic interaction, often through AR,
VR, or MR interface.
Weede et al. presented a surgical robotic assistant that provides two interventions:
(i) knowledge-based camera guidance that provides an optimal view of the surgical workspace
and (ii) a port and setup planning to provide an optimal position to insert the endoscope and
the two end-eectors into the patient’s body [138]. Rizzo et al. developed a virtual assistant
for psychotherapy that simulates traumatic events based on a patient’s description [111]. It pro-
vides general navigation for driving in the simulated scenario using a standard gamepad. It also
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
A Review of Cognitive Assistants for Healthcare: Trends, Prospects, and Future Directions 130:19
Fig. 4. An emerging trend in HCAs is using AR, VR, and MR, as demonstrated in the HCAs above. (A) Shows
a VR-based exposure therapy platform to simulate trauma-inducing events based on narration of people suf-
fering from combat-related PTSD [111]. It supports simulating general navigation for driving, dismounted
foot patrol, holding mock M4 gun, and generating audio, vibrotactile, and olfactory stimuli. (B) Presents a
use case of medication management and tracking in Kognit, an MR-based assistant for elderly individuals
[128]. (C) Shows ElderGames, an MR-based game to improve cognitive functions of elderly individuals [32]. It
provides natural interaction through multi-touch technology where multiple players can play together using
pens on the table top. Here, real objects (i.e., pens) are used to interact with virtual ones (i.e., virtual objects
displayed on the touch-sensitive and interactive table top). (D) Demonstrates configurable visualization of
a stereo endoscopy in ARssist [101], an HCA for real-time cognitive support for a first assistant (FA) in a
robotic surgery: In the le figure, the endoscopy is shown in a virtual display to enable the FA to visualize
both the surgical field and the endoscopy with minimal head rotation. In the right figure, the endoscopy
and the instruments are rendered inside the patient’s body. This enables the FA to intuitively operate in-
struments into the endoscopic field-of-view, even with an inconvenient docking configuration of the robotic
arms. (E) Presents the “da Vinci Si surgeon’s console” and the “Skills Simulator backpack” that uses VR for
training and evaluating robot-assisted surgery skills [38]. [AR: Augmented Reality, VR: Virtual Reality, MR:
Mixed Reality]
provides the option to simulate the context of dismounted foot patrol and a user-held mock M4
gun through a thumb mouse attachment. It provides audio, vibrotactile, and olfactory stimuli to
users for realistic simulation of the traumatic event. It is shown in Figure 4(A).
ARCoach [18] is a task-reminder system to assist individuals with cognitive impairments that
provides cues to complete tasks, detects incorrect steps on-the-y, and helps to correct a task.
Unlike other approaches that require users to match picture cues with reality, ARCoach overlays
articial information on real-world images captured through a webcam. The overlaid information
can be in texts, sounds, pictures, or a combination of these. CARA [69] is a cognitive augmented
reality assistant for the blind. Using head-mounted Microsoft HoloLens, CARA uses onboard video
and infrared sensors to construct a 3D map of the surrounding space. Then each object in the scene
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
130:20 S. M. Preum et al.
generates a voice that comes from the location of the object. As the object gets closer to the user,
the voice’s pitch increases. It helps blind subjects to avoid obstacles, perform navigation, scene
interpretation, and “formation and recall of spatial memories.”
Parsons et al. [91] use VR to train people with autistic spectrum disorders to enhance their social
skills. The key idea is to provide a safe virtual environment to practice social events by performing
role-play in dierent contexts. Kognit aims to help dementia patients by leveraging mixed reality
[128]. The authors describe their approach as therapeutic, which enhances the cognitive abilities of
dementia patients. Kognit produces new episodic memory visualizations by allowing physical and
virtual objects to co-exist and interact with each other. An example use case is shown in Figure 4(B),
where mixed reality is used to monitor the medication-taking behavior of an elderly person.
4.3 Control and Computation
In this section, we review dierent aspects of the control and computation component of existing
HCAs. It should be noted that some computational models relevant to sensing or perception and
actuation have already been discussed in the previous sections and subsections. Here, we mention
additional interesting insights regarding control and computational models.
4.3.1 Underlying Control and Computational Model.
Data-driven Model. HCAs often use o-the-shelf trained machine learning models for dierent
sub-tasks. To name a few, navigational assistants require object detection and scene interpretation
[7,20,120]; conversational assistants require detecting facial expression [14,110] and emotion
[110], understanding natural language [2,23,28,112] and dialogue management [2,9,23]; compan-
ion robots and ADL support HCAs require activity recognition [93] and person identication [110].
Most of the existing HCAs use o-the-shelf trained machine learning models for this. For in-
stance, Jaime et al. [110] use a web service for person identication and emotion recognition from
images due to the limited computation capacity of the robot assistant. The robot assistant captures
facial images and sends the images to the web service where all images are processed, and then
results are returned to the robot in real-time. A cognitive orthotic HCA for cooking [8]usessepa-
rate models for dierent underlying components, including activity recognition, human-machine
interaction events, behavior or activity errors detection, errors characterization, and diagnosis
regulation. Then the outputs from these models are integrated to monitor, detect, and assist in
cooking. Another approach is developing application-specic data-driven models separately and
integrating them into the HCAs. Consider Pearl [93], an HCA for the elderly that provides real-
time navigational guidance and adaptive, context-aware reminders for ADL. Pearl uses a quan-
titative temporal Bayes net for activity modeling and inference. It adopts a hierarchical variant
of a “partially observable Markov decision process” (POMDP) as the control architecture to miti-
gate the signicant level of noise in the assistant’s perception, which is originated from the laser
range-nder sensors and user input from microphone and touchscreen.
The models are often developed and adapted on-the-y through user training and longitudinal
usage, resulting in personalized models. NavCog [3] uses reinforcement learning to generate a
step-by-step instruction personalized to the mobility skill of a user. It builds a user-specic behav-
ior model to ensure successful navigation. However, when data is not enough for a new user, it
uses transfer learning techniques to apply other users’ data to the new model. Mattos et al. de-
velop a speaker-independent, language-independent model to assist hearing impairment patients
to read lips using Generative Adversarial Networks (GANs) to learn mouth pictures [74]. It uses
synthetic 3D models for training, and videos collected from real subjects for testing.
Knowledge-driven Model. Often HCAs use a knowledge-driven approach for control and com-
putation. In CognitiveEMS [126], one of the proposed approaches for real-time decision support
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
A Review of Cognitive Assistants for Healthcare: Trends, Prospects, and Future Directions 130:21
through EMS protocol-specic intervention suggestion is modeling the EMS protocols using a Be-
havior Tree. It is a computational model for knowledge representation that uses a dynamic data
structure that adapts to an ongoing process or incoming information ow. In KinoHaptics [104]a
patient’s physiotherapist develops a specic personalized exercise program. The exercise program
contains critical information, including what should be the angle of elevation of a joint during
physiotherapy sessions, how frequently it should be moved, and what should be the duration of
an exercise session. This exercise conguration le is loaded on a local desktop or laptop at the
patient’s home. Then the Kinect-based monitoring system (that is integrated with the local ma-
chine) monitors the user’s movement during a physiotherapy session and generates vibro-haptic
feedback to notify the user if they are overdoing or under-performing a movement that involves
the selected joint [104]. Weede et al. present a prototype of a cognitive system for minimally inva-
sive surgery that leverages knowledge acquired on the workow of surgical interventions through
collecting trajectories of dierent surgical contexts [138]. Their implementation includes several
control modes that can be called upon, depending on the surgical context, with the modes being
teleoperation, hands-on mode, and autonomic camera guidance.
Knowledge can also be presented as a rule base. Emma [33], a virtual assistant to promote psy-
chological and mental wellness, utilizes a rule base to generate (i) predened response and (ii) in-
tervention suggestions on emotionally appropriate micro-activities based on frequent ecological
momentary assessment (EMA) survey collected from a user. The rule base is generated by pro-
fessional care providers. Another virtual assistant for mental health [110] presents arguments to
persuade users for an intervention based on analogy, popular practice, or expert opinion.
Control models are often developed based on the application requirement, safety requirement,
and other constraints and thus embed domain knowledge. A humanoid, mobile robotic nursing
assistant for lifting and moving patients inside a hospital achieves semi-autonomous and au-
tonomous functionalities through a behavior-driven control model [47]. The control model is ad-
justed to ensure user safety (both patients and nurses) and operational eciency.
Hybrid Model. HCAs that provide treatment-related suggestions often combine domain knowl-
edge models with data-driven approaches. IBM Watson for oncology [137] combines both data-
driven and knowledge-driven models to provide customized decision support to oncologists for
diagnosis and treatment plan selection. Specically, it provides an interactive, context-aware in-
terface for information visualization and summarization using natural language inference and
knowledge integration. Upon logging into the system, oncologists can view and browse through
the relevant medical information for each of their patients, including but not limited to, medical
history, family history, test results, suggested treatment options, and knowledge curated from re-
cent and historical cases similar to the current patient. It generates treatment suggestions based on
a model trained on prior data collected from Memorial Sloan Kettering Hospital oncology records.
In addition, it combines knowledge extracted from over 300 medical journals and 200 textbooks
and rationales from leading oncologists. It also shows relevant statistics from the curated litera-
ture for dierent treatment options. In cognitiveEMS [94,98,126] data-driven language models
and distributional semantic models are used to extract safety-critical concepts that are relevant
to standard emergency medical service (EMS) protocols in real-time from the spoken language
collected at an emergency scene. In addition to that, domain knowledge from standard EMS pro-
tocols are integrated using behavior tree data structure to provide eective and safe intervention
suggestions to the responders.
In an empathetic virtual assistant for changing drinking behavior of individuals, real-time data-
driven models are combined with behavioral models from psychology and other domains [67]. The
system is controlled based on (i) the perception of user state sensed from real-time text and video
ACM Computing Surveys, Vol. 53, No. 6, Article 130. Publication date: February 2021.
130:22 S. M. Preum et al.
datastreamsand (ii) established psychometric instruments. Empathic reactions with intervention
suggestions are generated based on predened rule-base that captures the domain knowledge of
experts. For instance, the behavior change assistant decides its facial expression, head movements,
eyebrow expressions, and complex verbal reections based on a user’s perceived state and knowl-
edge of predened behavior protocols.
4.3.2 Device-level Computing. By following the four-tier computing model [119], where Tier-1
represents the cloud (e.g., data centers), Tier-2 represents cloudlets (e.g., high-end laptops, desktop
PCs), Tier-3 represents embedded devices (e.g., smartphones, wearables), and Tier-4 represents
energy-harvesting devices (e.g., RFID tags), a trend in existing HCAs is the usage of Tier-2 and
Tier-3 level local computing to provide pervasive cognitive assistance to users even with low or
no network connectivity and device constraints. Tian et al. present a navigational assistant for
visually challenged people that requires users to carry a mini laptop that performs the entire com-
putation locally [132]. It processes RGB-D sensor data mounted in the belt of a user to detect
staircases and pedestrian crosswalks by using a Hough transformation and an SVM classier to
enable blind navigation. Similarly, local processing is performed [63,117] in real-time, where a
user needs to carry an entire computational unit. Specically, a blind individual needs to carry
headgear containing a digital video camera and wear a smart vest that contains the processing
equipment, a Micro box PC-300 chassis [117]. The vest also contains rechargeable batteries.
In addition to navigational assistants, device-level computing (Tier-2, Tier-3) is also preferred in
other pervasive, mobile HCAs, especially in HCAs for cognitive orthotics. A visually impaired user
needs to follow a moving robot for indoor navigation that performs computation locally [84]. There
is an oine phase to map the environment and travel routes. Neumann et al. propose KogniChef, a
cognitive cooking assistant, where all the computations are performed on a “6-core Linux machine”
[83]. González-Ortega et al. propose a real-time system that runs locally on a PC with an attached
Kinect and asks the user sitting in front of it to perform dierent psychomotor exercises to assess
neuropsychiatric disorders and mental illnesses [39].
Pollack et al. propose Pearl, a cognitive assistant for navigational guidance and adaptive re-
minders for ADL, that contains a dierential drive system and two onboard Pentium PCs as the