ArticlePDF Available

Abstract and Figures

Locating and grasping objects is a critical task in people’s daily lives. For people with visual impairments, this task can be a daily struggle. The support of augmented reality frameworks in smartphones can overcome the limitations of current object detection applications designed for people with visual impairments. We present AIGuide, a self-contained smartphone application that leverages augmented reality technology to help users locate and pick up objects around them. We conducted a user study to investigate the effectiveness of AIGuide in a visual prosthetic for providing guidance; compare it to other assistive technology form factors; investigate the use of multimodal feedback, and provide feedback about the overall experience. We gathered performance data and participants’ reactions and analyzed videos to understand users’ interactions with the nonvisual smartphone user interface. Our results show that AIGuide is a promising technology to help people with visual impairments locate and acquire objects in their daily routine. The benefits of AIGuide may be enhanced with appropriate interaction design.
Content may be subject to copyright.
12
AIGuide: Augmented Reality Hand Guidance in
a Visual Prosthetic
SOOYEON LEE,NELSON DANIEL TRONCOSO ALDAS, CHONGHAN LEE,
MARY BETH ROSSON, JOHN M. CARROLL, and VIJAYKRISHNAN NARAYANAN,
The Pennsylvania State University, USA
Locating and grasping objects is a critical task in people’s daily lives. For people with visual impairments,
this task can be a daily struggle. The support of augmented reality frameworks in smartphones can overcome
the limitations of current object detection applications designed for people with visual impairments. We
present AIGuide, a self-contained smartphone application that leverages augmented reality technology to
help users locate and pick up objects around them. We conducted a user study to investigate the eectiveness
of AIGuide in a visual prosthetic for providing guidance; compare it to other assistive technology form factors;
investigate the use of multimodal feedback, and provide feedback about the overall experience. We gathered
performance data and participants’ reactions and analyzed videos to understand users’ interactions with
the nonvisual smartphone user interface. Our results show that AIGuide is a promising technology to help
people with visual impairments locate and acquire objects in their daily routine. The benets of AIGuide may
be enhanced with appropriate interaction design.
CCS Concepts: Human-centered computing Accessibility;Accessibility design and evaluation
methods;Empirical studies in accessibility;Accessibility systems and tools;
Additional Key Words and Phrases: Mobile assistive technology, augmented reality, nonvisual guidance in-
teraction, people with visual impairments
ACM Reference format:
Sooyeon Lee, Nelson Daniel Troncoso Aldas, Chonghan Lee, Mary Beth Rosson, John M. Carroll, and Vijaykr-
ishnan Narayanan. 2022. AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic. ACM Trans.
Access. Comput. 15, 2, Article 12 (May 2022), 32 pages.
https://doi.org/10.1145/3508501
1 INTRODUCTION
From a cereal box to a medicine bottle, nding objects is a critical task in people’s daily lives.
For people with visual impairments, this task can be a signicant undertaking. It might involve
using other sensory skills or requesting assistance from a sighted person who might not always
This work was supported by the National Science Foundation (NSF) Expeditions: Visual Cortex on Silicon CCF 1317560.
Authors’ addresses: S. Lee, Information Sciences and Technology, The Pennsylvania State University, University Park,
PA, 16802 and Rochester Institute of Technology; emails: sul131@psu.edu, slics@rit.edu; N. D. T. Aldas, C. Lee, and V.
Narayanan, Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802; emails:
{ndt5054, cvl5361, vijaykrishnan.narayanan}@psu.edu; M. B. Rosson and J. M. Carroll, Information Sciences and Technol-
ogy, The Pennsylvania State University, University Park, PA, 16802; emails: {mrosson, jmc56}@psu.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and
the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specic permission and/or a fee. Request permissions from permissions@acm.org.
© 2022 Association for Computing Machinery.
1936-7228/2022/05-ART12 $15.00
https://doi.org/10.1145/3508501
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:2 S. Lee et al.
be available. Today, several assistive technologies help people with visual impairments become
more independent, including computer vision systems and remote-sighted-assistance applications.
Remote-sighted-assistance applications connect users to either agents or volunteers, with Aira [6]
and BeMyEyes [14] as well-known examples. Although these applications are good at solving the
task at hand, some of them come with a monetary cost [6] and raise privacy concerns resulting
from assistance requests to strangers.
However, with recent advances in visual sensing and mobile technology, computer vision-based
systems can now recognize many objects, helping people with visual impairments become more in-
dependent than before. Nowadays, these automated systems can read documents, recognize prod-
ucts, recognize faces, identify currency bills, detect lighting, read handwriting, and even describe
scenes [5,42,70].
Studies of such applications indicate that the challenges people with visual impairments have
in recognizing and identifying objects have been signicantly mitigated. However, if people with
visual impairments want to acquire an object that has been identied, then the existing applica-
tions are neither helpful nor useful. The information they can provide is limited to “what it is”;
information about “where it is” is not attainable. Therefore, such systems cannot assist people
with visual impairment if they want to quickly and precisely move their hands toward the wanted
object and pick it up. Instead, people with visual impairments must depend on a non-technological
approach (exploring and fumbling with hand) until they nd and conrm the wanted object with
a hand-touch.
The task of acquiring objects becomes more challenging when faced with unexpected situations
(e.g., dropping a coin or pill and nding it on the oor), an unfamiliar environment (e.g., outside
the home), or a target that is surrounded by many similar objects (e.g., a cereal box on a grocery
store shelf). To eciently perform the task of nding and acquiring an object, people with visual
impairment need granular and accurate directional guidance, provided quickly enough to inform
their arm and hand movement.
Researchers have actively been investigating the target acquisition task as a challenge referred
to in multiple ways, including last meter, arm’s reach distance, peripersonal space/reaching, and
the haptic space problem; the research is often situated in mundane daily activities such as grocery
shopping [13,17,44,56,94], interacting with targets of interest on a touch-based medium [31,37,
38,77,78], aiming a camera to take photos [34,43,48], and simply localizing and grasping an
object that is within the arm’s reachable range [24,45,76,86]. Various supporting technologies
have been explored, including a glove, nger-worn wearable devices, and mobile devices. Recently,
a miniature drone has been examined for feasibility in a small-scale hand navigation task (e.g.,
nding a cup on the table) [32]. However, most of these research examples are not yet ready for
fullling the need of people with visual impairment in the real world, due to range of limitations
and restrictions.
To contribute to this active research area and help people with visual impairments navigate
to and acquire a wanted object, we have been investigating the feasibility of augmented reality
frameworks, such as ARkit and ARCore. They can provide information about the relative location
of objects in three-dimensional (3D) space and track the object even when it gets out of the camera
view. They can also produce a real-time estimate of the phone’s pose and position (assuming use
of a smartphone) relative to the real-world based on the information feed from the camera and
motion-sensing hardware. By appropriating this technology, we developed AIGuide, a smartphone-
based application designed as a visual prosthetic. It leverages the system’s orienting features to
help people with visual impairment guide the camera that is built into the handheld mobile device
and thus receive feedback to navigate to and acquire an target object.
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:3
The application uses Apple’s augmented reality framework, ARKit, to detect objects in 3D space
and track them in real time. Using this real-time data, the application calculates the position of the
hand holding the device relative to the position of an identied object. The users guide the cam-
era toward the object using speech feedback along with additional haptic and/or sound feedback
delivered by the smartphone held in the user’s hand. Thus, in our study, the smartphone is used
as a hand navigation aid system equipped with a camera that guides the users with direction and
instruction for achieving the goal of getting close enough to an object that can be grasped. In sum,
the focus of the support that our prototype system provides is on the process of navigating to the
object.
To address privacy concerns, our application does not transmit the acquired camera images to a
remote server, and all the computations are done locally. This work leverages prior work on model
creation and focuses on navigation and object picking support. The third-party application used
for model creation currently lacks accessibility features. However, model creation speed lends itself
to scalability. We scanned 20 additional objects with similar sizes, 10 cereal boxes, and 10 cracker
boxes to measure scanning times. On average, a model creation took 1 minute and 29 seconds.
Integration of accessibility-aware model creation approaches such as those in References [4,29,35]
will signicantly enhance the utility of our application. Kacorri et al. presented a method for people
with visual impairments to train a personal image classication model with photos taken by the
people to classify objects specic to the user [35]. They found one of the main challenges in training
custom image classication models is the performance degradation due to the absence of the object
of interest from training images taken by blind users. To solve the diculty for people with visual
impairments to capture well-framed photos, Ahmetovic et al. proposed ReCog, a mobile app for
custom object recognition with a camera-aiming guidance module that tracks target objects and
provides instructions to photograph [4].
We conducted a user study with 10 participants with visual impairments (9 total blindness and
1 legally blind) to evaluate the functionality and the overall user experience when AIGuide is used
to locate and grasp objects. The user study was carried out remotely in participants’ homes (using
Zoom) due to Covid-19. To compensate for our limited control attributed to the remote setting
(we had originally planned a study in a controlled lab environment), we replaced the original task
scenario of “nding a wanted product on a simulated grocery store shelf to “nding a shopped
product among a few items in a home environment. This necessary modication made the task
simpler than the original design; however, the study results suggested it still provided the partici-
pant with signicant experience with the AIGuide prototype system, and allowed us to investigate
and evaluate its usefulness and usability, including:
Eectiveness of the navigation aid that AIGuide prototype system provides for a non-visual
guidance;
Eectiveness of multi-modal feedback delivered via smartphone for guiding the user’s hand;
Overall experience of AIGuide prototype system in regards to its performance; and
Experiences of interacting with a smartphone used for non-visual hand guidance.
Our results show that AIGuide exceeded users’ expectations and can eectively locate and guide
users to the target object. Users found appropriate feedback provided by the application. They also
appreciated the ability to customize that feedback. An interesting nding was that although a spe-
cic feedback channel was more ecient at guiding them, they sometimes preferred another one.
They found the location information particularly useful to understand the relative location of the
object. Moreover, users envision AIGuide as the “big thing” on consumer-facing visual prosthetic
applications to nd objects. Finally, the quantitative results conrm that AIGuide is ecient with
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:4 S. Lee et al.
the task, and design considerations for nonvisual interaction with a smartphone emerged from the
qualitative video data analysis.
2 RELATED WORK
2.1 Object Finding Approaches
There are two object detection mechanisms mainly used by camera-based assistive applications
for object nding: human assistive object detection and automatic object detection based on com-
puter vision algorithms. Object nding applications using human assistance exploit crowdsourc-
ing or sighted human agents providing real time feedback. Well-known applications are Aira [6],
BeMyEyes [14], and VizWiz [15]. Aira employs professional agents that assist users through a con-
versational app interface. BeMyEyes connects users to crowdsourced volunteers. VizWiz accepts
photos and questions from users and provides feedback through text.
In recent years, computer vision-based applications, driven by advances in visual sensing and
mobile technology, have improved dramatically. Nowadays, these applications can serve users to
identify products (scanning barcodes), read documents, nd people (by recognizing faces), iden-
tify currency bills, detect lighting, read handwritten-text and get a scene description [5,42,70,93].
However, one feature that these systems lack is the ability to provide the relative position of the
object of interest. As a result, they are not able to orient themselves, which would not allow the
user with visual impairment to engage in the desired interaction with the object of interest. Fur-
thermore, these systems do not address the issue of locating and acquiring objects in arm reachable
distance, known as the last meter problem [48]. Recent work by Morrison et al. [52] prototyped
a computer vision–based system that processes the real-time image feed for orienting the system,
which is similar to our system. However, their system was designed to assist with a blind child’s so-
cial interaction (e.g., conversation engagement). In contrast, our system helps to orient the user’s
hand location relative to the position of the object of interest, with the goal of assisting people
with visual impairment to navigate their hand close enough to an object to pick it up.
In addressing this issue, some systems were proposed. Zientara et al. [94] designed ThirdEye, an
automatic shopping assistant system that recognizes grocery items in real-time video streams us-
ing cameras mounted on glasses and a glove. Another example is the system proposed by Thakoor
et al. [79], which utilizes a camera mounted on glasses, bone conduction headphones, and a smart-
phone application. This system takes visual input from the camera, processes that information on
a backend server to detect and track the object and provides auditory feedback to guide the user.
The drawback of such systems with specialized hardware is its limited scalability, its bulkiness
that might be impractical for daily use, and it requires wireless connection to either the internet
or a server. Bigham et al. [15] proposed a system, where the user captures an initial scene im-
age, requests annotation from crowdsource, and utilizes that information for guidance. However,
this asynchronous system depends on the quality of captured images, crowdsource availability,
internet connection, and raises concerns due to strangers interacting with the users.
In these regards, AIGuide is a self-contained smartphone application; it does not need external
hardware nor an internet connection. It provides the relative position in 3D space of objects in
real-time. It guides the user to the object of interest by using haptic, sound, and speech feedback.
Furthermore, it leverages technology, i.e., augmented reality frameworks, currently supported by
millions of phones available in the market [16].
2.2 Hand Guidance
Directional hand guidance applications assist a variety of tasks for users with visual impairment,
including physically tracing printed text to hear text-to-speech output [72,78], learning hand
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:5
gestures [59,75], and localizing, and acquiring desired objects [41,48,62,79,94]. There are two dif-
ferent approaches for directional hand guidance: non-visual and visual. For non-visual directional
hand guidance, previous works focused on exploiting audio and haptic feedback to nd targets and
trace paths with visually impaired users’ hands. Oh et al. exploited dierent attributes of sound
and verbal feedback for users with visual impairment to learn shape gestures [59]. Sonication is
used to guide users with visual impairment reach targets in their peripersonal space [62]. Tactile
feedback from a handheld device is exploited to nd targets on a large wall-mounted display [41].
Wristbands with vibrational motors are used for target nding and path tracing [31].
For visual directional hand guidance, visual information was collected from cameras to provide
multimodal feedback for hand guidance. Text recognition, along with audio and tactile feedback,
is exploited for a nger-worn text reader. Well-known examples are FingerReader and HandSight,
allowing users to physically trace printed text to hear text-to-speech-output [72,78]. Thakoor
et al. used a camera mounted to wearable glasses to provide audio feedback based on a camera
eld of view to allow users to reach target objects [79]. Stearns et al. used Hololens to magnify
text and images collected from a nger camera for users with low vision [77]. To identify and track
target objects, Zientara et al. used a camera mounted on a glove to guide hand movements and the
orientation of the users and to send video streams to the server, where computer vision algorithms
analyze them [94]. Although previous works on hand guidance to a target found handheld devices
eective in automatic hand tracking, the systems lack precise hand location information, which is
critical in the real-world environment. We tackle the problem by modifying it from tracking the
hand to tracking the location of the phone, held by the user in their palm, in an augmented space
to get precise hand location.
2.3 Nonvisual Guidance Interface
Dierent types of directional information have been evaluated for assisting users with visual im-
pairment, largely for independently navigating indoors or outdoors and hand movements to a
target of interest. The auditory channel, haptic modality, or a combination of both has been the
major means for presenting the information needed to assist navigation [12,18,28,37,39,54,60,
78,82,92].
Both verbal (e.g., speech) and non-verbal feedback (e.g., sound, tone) have been commonly used
and assessed for communicating direction and distance information. Some researchers evaluated
the eectiveness of these auditory types of feedback for the task of exploring and understanding
2D surfaces. The work of Kane et al. [37,38], Stearns et al. [78], Oh et al. [59],andShilkrotetal.
[72] are examples. The task of guiding a hand to a target so as to touch or grasp the object has also
employed both verbal and non-verbal cues [12,23,24,39,76,82,86]. More specically, the task of
grocery shopping has been frequently examined due to the complexity of the task [17,44,56].
Haptic feedback has been investigated and evaluated as either an alternative or a complement to
auditory feedback, with the hope of mitigating the overload of information on auditory channels
(hearing is a major way of receiving and processing information for people with visual impair-
ments). One focus has been on waynding in indoor or outdoor environments, as well as hand
guidance for object localization and acquisition with two types of haptic modality: vibration and
force-feedback. The parameters of frequency, intensity, location, and pattern have been used to
convey information about direction, distance, and obstacles, along with exploration of dierent
form factors such as bracelets [19,31,36,61], belts [22,30,85], shoes [83], gloves [44,46,91,94],
handheld devices [10,71,87], nger-worn devices [25,68,78], canes [84], and tiny drones [32].
Along with considerable research on unimodal feedback (both auditory or haptic), multimodal
feedback that combines these two types has been investigated and compared with singular feed-
back. The goal has been to assess if multimodality is more eective than unimodality. Researchers
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:6 S. Lee et al.
combined the two types (auditory and haptic) to deliver them in a redundant form or complemen-
tary form for presenting the same navigation information (e.g., speech or beep and vibration at
the same or dierent time for the directional information). Several research groups [41,44,66]
evaluated the directional information delivery with the speech and haptic (vibration and shoulder
tapping) feedback at the same time for the task of hand guidance and navigation support system
for waynding, respectively. They found that multimodal feedback performed better or as well
as speech feedback alone. The researchers explained that the combined representation made the
information more conspicuous and the cognitive overload lessened, because the user with visual
impairments can process either modality alternatively.
Soto et al. [8] conducted a comparative study using a drone as a navigation aid. This study found
that non-verbal feedback (the sound from the drone) was more helpful when it is simultaneously
delivered with haptic feedback (a pull on a leash). Furthermore, a signicant body of work has
explored the specic characteristics of dierent modalities, considering whether and how their
strengths and weaknesses match well to specic navigation needs. Manduchi et al. [47]evaluated
hand guidance to a target on the wall, using beeps that varied in frequency and loudness to convey
distance to a target and the position of the target’s center, respectively. Vibration also was used as
a cue for correcting the angle of the camera relative to the target. Finally, Merabet et al. [50]used
dierent feedback modalities to present dierent types of information. In their study, landmarks
were presented with a custom auditory display, a beep was used to signal a 30turn, and vibration
was used to alert to detected obstacles.
For our prototype hand guidance system, we took consideration of the prior work’s ndings into
our feedback interface design. Similarly to those work, we implemented dierent types of feedback
modality (speech, sound, haptic) for the dierent navigation information (direction, distance, cor-
rection) with the goal of maximizing each modality’s strength and complementing each other in a
way that they can bring out the greatest synergistic eect possible for a hand-to-a target task aid.
The speech is used for presenting the directional information (left, right, up, and down) and it is
only provided when the position needs to be corrected for the case of the target being out of the
camera view. For signaling the hand-to-the target distance, we used a frequency of the sound or
vibration presentation and these can be conveyed individually or concurrently per users’ choice.
For indicating the condition of target’s out of the camera view, a dierently designed vibration
cue is delivered. To our best of knowledge, our prototype is the rst example that makes use of
the real-time navigational information enabled by the ARkit powered handheld system and maps
to various feedback modalities to present them in a complementary way. This way helps avoiding
the use of the same modality for the same navigation information, interference and conicts but
at the same time making the feedback stronger using dierent feedback modalities concurrently
if needed.
2.4 Mobile-based Assistive Technology
The ubiquity of smartphones among people with visual impairments [51] has generated a large
body of research and development of mobile device-based assistive applications. These systems
accommodate tasks, including object recognition [5,6,14,15,70,93], object search [6,14,15],
text recognition [6,14,37,42,70], and navigation [3,9,20,25,40,47,53,63,64,67,87,89]. Most
of these systems require an internet connection [5,6,15,42,67,70,74,93], and some of them
require augmenting the physical environment with sensors or landmarks [15,49,53,67]. For users
in locations with unreliable internet connections, online applications can become frustrating; this
makes oine applications an attractive alternative. Although systems that require augmenting the
physical environment eectively assist users, these systems are impractical to be widely deployed
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:7
due to cost and maintenance. In this regard, AIGuide is a self-contained smartphone application
that does not require a custom infrastructure, and it does not need a cellular or Wi-Fi connection.
2.5 Mobile-based Nonvisual Guidance Interactions
Various assistive applications and built-in screen reader features such as Voice Over [1]andTalk
Back [2] have made smartphones much accessible to people with visual impairments. Due to its
widespread use, many mobile-based technologies have been researched and developed to guide
people with visual impairments [3,9,20,25,40,47,53,63,64,67,87,89]. Many studies have
focused on analyzing dierent types of interactions such as touchscreen interaction and one-hand
operation to accommodate the eyes-free interaction [9,26,40,58,81]. Input and output modalities
that can support the non-visual interaction have been investigated.
Modern smartphones make use of touchscreen interfaces to interact with the user. For peo-
ple with visual impairments, these interactions are performed commonly through gesture-based
screen readers. Grussenmeyer et al. [26] provides an overview of the state of touchscreen accessi-
bility including smartwatches, tablets, and smartphones along with the output provided by those
devices. Additionally, it presents open research area such as 2D vs. 3D content accessiblity, collabo-
rative work between blind and sighted users, and accessibility of large size touch screens. Kim et al.
[40] proposes a smartphone-based navigation system that enables one-hand operation by touching
and shaking the smartphones. Touch is used to retrieve information and select items while shaking
is used to navigate between windows. Yoon et al. [89] proposed an augmented reality application
that retrieves visual information from the environment to enable users to retrace paths. They note
that co-design partners of the app were people with visual impairments who used either a white
cane or a guide dog, so one-hand operation was important. They discovered that the positioning
of the device was important for the performance of the app, and it remains an open question on
how to provide live feedback to the user about the positioning of the device. Another waynding
application that uses visual processing of the environment is Manduchi’s system [47], which re-
quires users to scan their environments to locate visual markers. A limitation of this system is that
it required users to be constantly rotating their phones to nd the target.
In addition to the investigation of an input aspect of an interaction, these systems examined
and assessed the auditory and haptic channels for output modality. Specically, they make use of
speech, sonication, and vibration patterns in a single or combined. Azenkot et al. [9]proposeda
haptic-only guidance system where the user performs a gesture in a smartphone and the device
reacts by generating vibrations. Pielot et al. [63] created a tacticle compass that produces vibration
patterns to aid in the navigation of outdoor spaces. Ahmetovic et al. [3] explored the sonication
of rotation instructions to provide accurate directions. NavCog [67] leverages a mobile application
and Bluetooth beacons to give turn-by-turn verbal instructions. Other systems use a combination
of both auditory and haptic modality to guide the user. Ghiani et al. [25] makes use of a pair of
nger-worn haptic motors connected to a mobile device that convey directional information by
emitting vibration patterns to the thumb and index nger, and complements it with short audio
descriptions. Spacesense [87] conveys spatial information by using an array of vibration motors
attached to a smartphone, while providing turn-by-turn using speech. Clew provides both auditory
and haptic cues to retrace paths made by the user [89].
In our work, we explored the use of multi-modal feedback that work complementary to each
other and play a role for dierent guidance information at the same time for the hand guidance
to the target object. AIGuide provides an interface that allows the participants to interact with
their smartphones. Touching, tapping, swiping, and shaking are the input modality enabled for the
interface access. Our experimental and observational study allowed us to fully analyze how people
with visual impairments use and interact with the smartphone to make full use of the guidance that
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:8 S. Lee et al.
Fig. 1. Screenshots of AIGuide. (a) Selection interface. (b) Guidance interface. (c) Seings interface. (d) Tuto-
rial interface.
AIGuide provides and how and why they found AIGuide guidance useful and exciting particularly
in the context of the smartphone use as a hand guidance aid. We present insightful ndings, the
design considerations guided by those ndings, and what they mean for the future design direction
for the smartphone-based nonvisual hand guidance and supporting interaction.
3 APP DESIGN
AIGuide is an iOS app that assists people with visual impairments localize and grasp objects in
their surroundings. It is powered by ARKit, which both detects and tracks objects, even when they
do not remain inside the camera view. In this section, we describe the app interface and hand-to-
object guidance ow.
3.1 App Interface
AIGuide provides directional instructions to navigate the user’s hand to the desired object. It uti-
lizes a combination of speech, sound, and haptic feedback to accomplish its goal. It was designed
with VoiceOver in mind, a gesture-based screen reader accessibility feature integrated into iOS
devices. It has four sub-interfaces: object selection, guidance, settings, and tutorial (see Figure 1).
3.1.1 Object Selection Interface. The object selection interface is used to choose the object of
interest. By default, when users open the application, they are taken to this interface. It consists
of a table and search bar. The table presents the names of objects in a scrolling, single-column list
of clickable rows. They can lter results using the search bar, which for iOS devices includes voice-
to-text input. In the current version, the list is populated by a database inside the app. However,
in future versions, we envision that AIGuide will allow users to extend this list by downloading
additional models or training the app to detect personal objects. Another alternative is to allow
third parties to provide their models. For example, a grocery store might allow users to download
models of products that they carry. In our experimentation, we utilize a third-party application for
creating the models.
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:9
3.1.2 Guidance Interface. The guidance interface is the core of the app. Through this interface,
the app detects and localizes target objects to have the users guide their phones to get the item
and conrm it is the desired item. To interact with this interface, we provided two labels and
four buttons. Although all the information is provided to the user by the speech synthesizer, we
included labels with relevant information such as the current instruction, contextual information,
and object location. This way, a user can easily retrieve this information with the VoiceOver feature
if they want to be reminded of a current instruction or get some additional information.
At the top of the screen is the General Information Label. This label contains guidance instruc-
tions and notications. Examples of the content shown here are contextual content like “Please
slowly move the camera in front of you. I will tell you when I nd the item” or “You got it! You have
ITEM. You can go back to the selection menu. While in guiding mode, it will reect the current
instruction such as “Left, “Up, “Down,” or “Right.
Below the General Information Label, we have the Location Label. This label always reects the
current position of the item relative to the phone’s camera view. This information might also be
helpful for users to get an idea of the relative position of the item to them. An example of text here
might be “ITEM is 2 feet away, 30 degrees left and 5 inches below the camera view.
In the middle of the screen is the Guide/Stop button. Depending on the phase that a user is in,
this label will toggle between Guide and Stop.
Finally, at the bottom of the interface are the restart and exit buttons. The exit button is used to
return to the selection menu. The restart button is used to restart the whole guidance process to
the selected item.
3.1.3 User Seings Interface. This interface oers a convenient way to adjust the user’s experi-
ence. It includes two toggle switches to turn on/o haptic and sound feedback, a slider to control
the speaking rate, and a submenu to choose the measuring system (i.e., metric system or imperial
system), by default, has the same setting as the user’s phone.
3.1.4 Tutorial Interface. This interface allows users to experiment with the application without
using it. In other words, this interface simulates the interactions that a user would have when
guided to a real object. It consists of 11 pages of content that give an overview of the app and
demos every stage of the hand-to-object guidance. This interface was included based on feedback
from our pilot study.
3.2 Hand-to-Object Guidance Flow
The process of guiding the user’s hand to an object consists of four phases: selection, localiza-
tion, guidance, and conrmation. The distinct steps were included as part of a continuous process
designed to guide users and oer them options of desired levels of interactions (feedback and re-
quests). However, all these steps are optional and can be combined. The desired number of phases
was not part of this evaluation and remained to be explored. Figure 2depicts a diagram of this
process.
3.2.1 Selection Phase. The selection phase involves a user scrolling through the list of items or
using the search bar to lter the results. Once the user selects an item, they will be taken to the
guidance interface.
3.2.2 Localization Phase. The second phase is localization that occurs in the guidance interface.
During this phase, the app nds the item using the camera and tracks the localization of the item
with respect to the phone. As soon as the guidance interface starts, AIGuide tells the user: “Please
move the camera in front of you. I will tell you when I nd the item. While the user moves the
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:10 S. Lee et al.
Fig. 2. Hand-to-object guidance flow diagram. Users go through four phases: (1) selection, (2) localization,
(3) guidance, and (4) confirmation. The transition between these phases are triggered by the systems (e.g.,
when localizing an object) or user actions (i.e., clicking a buon or shaking the phone).
phone around, the app reminds the user that it is searching item by saying “Scanning” every 5
seconds. Once it detects the item’s location, it emits a notication sound and tells the user: “ITEM
found. Click the ‘Guide’ button when you are ready.
3.2.3 Guidance Phase. After the user presses the Guide button, the app is in the guidance phase.
The app starts this phase by giving the item’s location with respect to the camera. It says: “Started
guidance. ITEM is x feet away, y degrees left, and z inches below the camera view. Then, it starts
to give guidance feedback to the user.
AIGuide uses haptic, sound, and speech feedback for guidance. To convey that the object is in the
camera view, it emits a beeping sound and/or tap depending on the user’s settings. The frequency
of this feedback is inversely proportional to the distance to the object. As the user’s phone gets
closer to the object, the frequency keeps increasing. As soon as the object gets out of the view, a
short vibration is triggered, and the app corrects the hand position with speech instructions: “up,
“down, “left, or “right.” For example, if the camera view is to the object’s left, then the app will
repeat “right” until the object gets in the camera view. If the camera view goes above the object,
then it immediately corrects the position by repeating “down.
When the object is in the camera view and close to the object, around 20 cm, it lets the user
know by saying: “ITEM is in front of the camera. Click the Conrm button or shake the phone
when you are ready. This is the end of the guidance phase.
There might be instances where the user wants to stop guidance to receive detailed information
about the object’s location. For these instances, the user can press the Stop button. If the user
presses the “Stop” button, then AIGuide tells the user: “Stop Guide. Please take a step back to
reposition the camera. You can click the Guide button to resume. As soon as the user presses the
Guide button again, AIGuide resumes guidance and tells the user something like, “Started guidance.
ITEM is x feet away, x degrees left, and x inches below the camera view.
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:11
Additionally, if the item is behind the camera, then the user is notied about it. This can happen
if (1) the user points the phone in the opposite direction to the object or (2) the user’s phone gets
behind the object. For (1), it just says, “It appears that the item is behind the camera. For (2), it
says, “It appears that the item is behind the camera. Please take a step back”. During testing, case
(2) happened when the object was on low surfaces, and the hand went behind the object.
3.2.4 Confirmation Phase. Once the user is notied that the item is hand-reachable and in front
of the camera view, the conrmation phase starts. The user can double-check that the grasped item
is correct by clicking the Conrm button or shaking the phone. Any of those actions trigger AIGu-
ide to tell the user, “Please move the item in front of the camera. Then, AIGuide tries to recognize
the item. Once it recognizes the item, it tells the user, “You got it! You have ITEM. You can go back
to the selection menu. Since holding the item and pressing a button might be challenging, we let
the users shake the phone to trigger this stage. Additionally, the user can trigger item conrmation
at any point. This can be used, for example, if the user feels that they can grasp the item before
the app noties them it is already close.
3.3 Implementation
The main components implemented for AIGuide were 3D Object Detection with Tracking and
Guidance Library. For both components. We leveraged ARKit, Apple’s framework for augmented
reality applications. This framework utilizes a technique called visual-inertial odometry to under-
stand where the phone is relative to the world around it and exposes conveniences that simplify
the development of AR solutions. Our application eased the process of detecting and tracking the
position of objects with respect to the device’s camera frame.
ARKit recognizes visually salient features from the scene image called feature points. Then it
tracks dierences in the positions of those points across frames as the device moves around and
combines them with inertial measurements. This results in an estimate of the position and orienta-
tion of the device’s camera with respect to the world around it. A similar process is used for other
tools such as Vuforia and ARCore.
3.3.1 Object Detection with Tracking. ARKit allows us to record feature points of a real-world
object and use that data to detect it in the user’s environment. This feature was revealed in Apple
Worldwide Developer Conference 2018 and is available for iOS 12+ devices with A9 processor or
later [80]. Currently, ARKit and Vuforia are the two major developer kits natively supporting this
kind of feature [57,69].
The process starts by recording feature points and saving that data in an .arobject le. To get
this type of le, we used a utility app provided in Apple’s documentation [69]. Internally, this
application
Extracts spatial mapping data of the environment. The same process used to track the world
around the device’s camera.
Slices the portion of the mapping data that corresponds to the desired object.
Encodes that information into a reference object, called AR-Reference Object.
Finally, it uses the reference object to create an .arobject le.
After we obtained .arobject les for every object, we embedded these les in our application.
Then congured the AR session to use these les to perform 3D object detection. Every time
ARKit detects an object, an ARAnchor is added to the session. ARAnchor is an object representing
the position and orientation of a point of interest in the real world [7]. Using a reference to that
ARAnchor, we can track the object across video frames.
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:12 S. Lee et al.
ALGORITHM 1: Pseudo-code to get the horizontal angle from the camera with respect to the
object
procedure GHD(ObjectT ransf orm,CameraTr ansf orm)
cameraPosition =дet Posit ion(ob jec tT r ans f or m)
object Pos ition =дetPosition (cameraTr ansf orm)
cameraPosition.y=0
object Pos ition .y=0
anchor F r omCamer aPosition =nor mali ze (cameraPosition objectPosit ion)
newT rans f or m =Transform with translation of 1 in front of camera
newPosit ion =дet P osit ion(newT rans f or m)
newPosit ion.y=0
positionF romCamera =nor mali ze (cameraPosition newPosit ion)
dotProduct =dot(positionF romCamera,anchor F r omCamer a )
vectorNormal =cross(positionFromCamera,anchor FromCamera)
anдle =acos (dotProduct >1?1 : dotProduct <1:dotProduct).radiansToDrees
return vectorNormal.y>0?anдle :anдle
end procedure
3.3.2 Guidance Library. ARKit incorporates computer vision and computer graphics tech-
niques to capture the dynamic relation of the camera and the target object. The information in-
cludes the orientations and locations of the phone and the target object. Such positional informa-
tion is utilized in AIGuide to provide simple directional guidance to the users. For example, the
angle between the camera’s view and the target object is measured to provide the horizontal di-
rection, i.e., left, right, or behind. Below pseudo-code shows how to get the horizontal angle. The
height dierence between the phone and the target object is measured to provide the vertical di-
rection, i.e., up or down. Additionally, the app provides the distance between the camera and the
object to have the users guide the phone to approximate how close they are to the object. It is also
measured based on the two-dimensional positional dierence between the phone and the object,
which excludes the height dierence.
The haptic and sound feedback were synchronized to better inform the users. The sound feed-
back was generated using a beeping sound at a frequency of 440 Hz. The pace is dynamically
adjusted between 60 and 330 bpm to signal how close the users are to the target object. Addition-
ally, the sound and tapping feedback were only delivered when the object was inside the camera’s
viewing frustum. Finally, for user settings, we implemented an iOS settings bundle as described
here [33].
4STUDYDESIGN
4.1 User Study Design
We designed a user study to (a) evaluate the accuracy and eectiveness of our application proto-
type at nding a target object and guiding the user’s hand to it and (b) collect feedback about the
performance and overall user experience. Because our access to people with visual impairments
was limited, we conducted a pilot study with six blindfolded sighted people in a controlled lab-
oratory setting and used that study to clarify and rene our experimental procedures. The user
study with people with visual impairments was conducted in subjects’ homes as a virtual obser-
vational study that used the Zoom video chat platform [95]. This choice was motivated by the
COVID-19 pandemic and led us to adopt a novel approach in overcoming the challenges of not
conducting face-to-face interaction-based human subject research. This revised research method
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:13
Fig. 3. The user study was performed remotely through a videoconference. Participants tested the applica-
tion in their home space. The participant in the le-side images put iPad on the side table for the top view
and iPhone on the table where the items were positioned for the boom view (images on the le). The par-
ticipant in the right-side images used her laptop for the top view and put iPhone on the chair for the boom
view (images on the right).
was validated and rened with a person with visual impairment through continuous discussion
and a pilot simulation during the whole study design process. It enabled us to conduct the user
study with participants who have visual impairments using their home settings. Our study adapta-
tion to the virtual channel can be viewed as a secondary contribution to the accessibility research
community. The following sections present the details.
4.2 User Study: At-Home User Study Using Video Chat
4.2.1 Study Seing and Protocol. Because of the COVID-19 pandemic and our university’s de-
cision to move all operations to remote work, we redesigned the user study to be conducted in
the homes of the participants via, Zoom. A revised user study design that included remote inter-
action and observation was reviewed and approved by University IRB board. The revised methods
allowed us to conduct the experiment remotely, such thatthe participants with visual impairments
were observed using video feeds and interactions between the experimenters and the participants
was mediated by video chat. Importantly, the task scenario was adapted for the home space, such
that the task shifted from nding products and picking them up from a grocery store shelf, to
nding purchased products and picking them up in the participant’s home environment.
To operate within the home setting as if it were a laboratory, we prepared a study kit for each
participant that included one iPhone 11 with the AIGuide application installed and three identi-
cal grocery items (box of cereal, canister of Lipton tea mix, and box of fruit-and-grain bars) and
mailed the kit to each participant’s home. Beforehand, we communicated with the participants
and obtained their agreements and consents for the at-home study setup regarding the ve items:
(1) receipt of the study kit and its contents and commitment to return the iPhone with a return
postage provided; (2) need for coordination in study set-up through a Zoom video connection;
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:14 S. Lee et al.
(3) if possible, availability of two home devices to be used during the Zoom video session (to
allow two video feed views from dierent angles); (4) conduct of Zoom video chat study, with
a researcher’s observing the participant’s activities and comments through the video feed; and
(5) video recording of the sessions.
Once we received these agreements from a participant, we sent out the study kit to arrive a
few days ahead of the study date for that participant, allowing time to become familiar with the
iPhone 11 and to learn about the AIGuide application through the tutorials that had been added to
the application. A Zoom meeting created with the meeting ID and password for joining was sent
to the participant along with explanation of how to use the meeting ID and password.
We explained to participants that it is important to observe their performance of the trials from
two dierent camera views, a wide angle view and close-up frontal view. This is particularly critical
for a study of mobile device-based hand guidance because of the interaction of the hand with the
smartphone and how participants move their hands and reach out to a target product in a relatively
small space. Thus, we asked the participants to set up two devices in dierent locations, such that
they provide both a side view and a front view (Figure 3).Thesideviewshowsthewidescene
with the overall setting of the study; in this view the whole body, hand posture and movement can
be observed. The front view shows a view of front of the participant and the three product items.
This view shows the detailed process of hand movement and touching of the target item in the
last phase of guidance. These two scene settings complemented each other and reduced the risk
of missing an important moment of the nding and reaching. It allowed the researcher to observe
the user interaction akin to a face-to-face environment, albeit remotely.
This virtual at-home user study with two video feeds, including the collaboration by participants
during set up allowed us to conduct the remote experimental user study with people with visual
impairments and to provide them with an experience similar to the lab study originally planned
as an end user validation. The operation and results from all 10 of the user sessions demonstrated
that our remote study using video chat was an eective approach for formative evaluation of the
assistive technology.
4.2.2 Participants. Ten participants (M: 5; F: 5) were recruited from multiple cities through a
local chapter of National Federation of the Blind, contact of previous study participants, and snow-
balling sampling. Their ages range from 22 years to 45 years old. The participant table (Table 1)
lists demographic and personal details for each participant. All of our participants have a signi-
cant level of visual impairment (see Table 1) and are users of iPhone with Voiceover. All reported
that they have normal hearing except for one participant, who uses a hearing aid. None of the par-
ticipants had a problem in sensing haptic feedback using their hand. They have experience with
haptic sensation mostly through braille and cane use but also with vibrations on smart phones and
other assistive devices such as BlindSquare. None of the participants have any arm or hand motor
impairments. They participated in the study on a voluntary basis without any compensation.
4.2.3 Data Collection and Procedure. The experimental task was to nd each of three dier-
ent grocery products positioned near an surface edge of a kitchen counter, a desk, or a dining
table and to reach and pick up each item. Each task was carried out three times, using the three
dierent options for navigation feedback (sound, haptic, or both sound and haptic) in combina-
tion with speech guidance. The study session including an experiment setup (20–30 minutes), the
performance of experiment (30 minutes), and a post-task interview (30–50 minutes) took 90 to
120 minutes. The study time varied mainly due to variation in time required to set up the two
devices in the right places with desired angles, as well as becoming familiar with the use of iPhone
without a home button. At the beginning of the session, the researcher asked the participant or
a family member of the participant who was present at the time of experiment to place the three
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:15
Table 1. Participants Demographics
ID Gender Age Severity Onset of vision loss Hearing Diculty
P1 M35 Totally blind (TB)
No light perception 20 yrs old None
P2 F44 TB, a bit of light perception
retinopathy of prematurity From birth None
P3 F37
Legally blind
Glaucoma
(can see contour and shape)
From birth None
P4 M23 TB, glaucoma on cornea 18 yrs old Hearing Impairment
P5 F22 TB, a bit of light perception,
Childhood brain tumor 5yrsold None
P6 F22 TB From birth None
P7 M29 TB, Congenital low vision
Glaucoma 10 yrs old None
P8 M45 TB 6yrsold None
P9 F44 No light perception, TB From birth None
P10 M40 TB From birth None
items. The researcher suggested a dining table or a desk where the participant can be 5–6 feet
away from those places and walk toward them. Once a suitable surface was chosen, the researcher
helped the participant or family member in placing the items, letting them know whether all three
items were in the Zoom view, and adjusting the locations as necessary.
Once the products had been positioned, the researcher and participant (or family member) dis-
cussed and chose the best possible placement for the second device. The conguration of this
second device typically required numerous iterations of adjusting the vertical and horizontal an-
gles of the device, describing to the participant or family member what the view looks like, and
asking them to move the device left or right and/or upward or downward so that the researcher
could see as much of overall view as possible. After the set up was complete, training took place.
This included Q&A from the tutorial experience, a brief description of the how AIGuide app guides
from the scanning phase to the conrmation phase, the learning of iPhone 11 usage especially for
the participants who were not familiar with iPhones having no home button, and actual trials to
raise and clarify any confusions they might have in using the guidance provided. Finally the actual
trials were performed.
Each participant carried out nine trials, with three types of feedback provided to support the
nding and acquisition of each product. The order of trials (one product paired with one of three
feedback types) was determined by a modied Latin Square design to counterbalance feedback
types and minimize sequence eects. The location of the products was rotated in a random fashion
for each set of three trials (the researcher simply asked the participant or the family member to
change the location of the products without specifying exact order and location). The researcher
ensured that the revised location of the products was dierent from the previous set of trials.
At times the researchers asked the participant or family member to push the item in a specic
direction to adjust the gap between the items and get all three in the Zoom view.
Two participants were helped by a family member (sighted spouse; sighted sister), and the re-
maining eight participants managed the product set up by themselves. When everything was in
position, the participant was asked to walk approximately 5 feet from the surface holding the gro-
cery products. Also, the experiment result did not show a dierence in the performance between
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:16 S. Lee et al.
the groups of participants with the help of family member or without. We surmise that it is because
all three are adjacently and closely located to each other and the switching was done in a quick
and random manner.
We collected two types of data: (1) video recordings of the Zoom video-mediated task sessions
and (2) audio recordings of the post-task interviews. The researcher used video recording software
Open Broadcaster Software (OBS) to record the video sessions. OBS recording was chosen to
remedy the problems caused by the automatic speaker-focused screen view of Zoom’s built-in
recording feature, which would have cut out parts of the video in the case of the researcher talking.
Besides, the view of the muted speaker is not recorded for the same reason, but this setting is
necessary to remove the noise created by device feedback when two devices are adjacent each
other. At the beginning of the Zoom connection, the researcher obtained the verbal consent from
the participant for study participation, as well as for video/audio recording of the Zoom session.
The verbal consent process was also recorded.
Once the nine task trials were complete, the researcher conducted a semi-structured interview
with the participant. The interview questions focused on the helpfulness and usefulness of the
guidance provided by the prototype, the types of information provided, and the dierent types of
feedback that had been experienced. From the video recording, we collected the performance data
such as time measurement of the each phase of the guidance and the number of failures. We were
also able to observe the participant reactions (both verbal and non-verbal) to specic features at
particular moments of task performance.
5 RESULTS
From the video chat records, we collected quantitative and qualitative data that helped us to eval-
uate the eectiveness of the AIGuide’ hand guidance provided with the capability of calibration
and localization that ARKit enabled. We were also able to observe the assistive experience of the
AIGuide for nding and obtaining target that the people with visual impairments had with their
self-report feedback. We describe the details in the following sections.
5.1 antitative Analysis
To assess and compare performance and the eectiveness of the three feedback modes, one re-
searcher measured the time each participant took to nish each trial by watching the video. To
increase precision, we took timed each phase of the guidance separately: scan to nd the target
-> guide -> scan for conrmation. We also noted and counted the following errors: (1) incorrect
item retrieval, (2) incomplete or aborted trials (e.g., stopped by participants request), and (3) incom-
plete or unattempted conrmation (this phase was stopped on request due to a lengthy scanning
process).
5.2 Performance
Our experimental goals concerned the eectiveness and preferences among the feedback types, as
well as the more general question of prototype usability. Each pairing of item and feedback type
had at least 30 trials (10 participants ×3 trials of each item); a few items and feedback types had
additional trials due to retrials or more trials due to requests by the participant. Note that for the
performance time analysis, we did not include time it took in the scanning phase, because the
guidance begins only after object detection. We included successful trials only. Two data points
were missing—one due to occlusion from video footage and another because the participant failed
to acquire an item but chose not to try again.
On initial inspection, the average task completion times (seconds) and the standard deviation
(seconds) for the three items (box of cereal (M = 20.03, SD = 20.10), canister of tea mix (M = 24.60,
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:17
SD = 30.15), and small box of fruit-and-grain bars (M = 23.30, SD = 17.23)) suggest that the pro-
totype system might perform better with the relatively large, box-shaped object (cereal box, the
largest product in the set). However, all three items led to fairly similar performance times and
the standard deviation was extremely large. A one-way Analysis of Variance (ANOVA) on item
type found no statistically signicant dierences across items.
The average task completion times and the standard deviation (seconds) for the three feedback
types (sound only (M = 19.61, SD = 11.09), haptic only (M = 28.11, SD = 37.16), and combined
sound and haptic (M = 20.60, SD = 12.29)) also suggest some interesting dierences. However, the
standard deviations were again quite large, and a one-way ANOVA on feedback type found no
statistically signicant dierences. Given the small sample size, the variability was too great for
any of these to be signicant. We will need to gather further data to determine whether sound only
and sound plus haptic feeback are particularly helpful for the participant with visual impairment
to perform the task.
5.2.1 Errors. We considered a task to have failed if the wrong item was grasped or the task was
incomplete; the incidence of these cases was calculated. In this case, the task did not include the
conrmation phase, because we implemented it as an additional phase for assuring participants
about item retrieval. We observe only failed tasks out of more than 90 trials. Both P1 and P4 reached
out to the wrong item when using the haptic-only feedback type and in the case of the tea mix. The
failure occurred not because the guidance was problematic but because the items were positioned
too closely to each other. This overall result indicates high accuracy of AIGuide app in nding an
object and guiding the participants to acquire it.
5.2.2 Confirmation Phase. We also tracked the number of times that the conrmation phase
was skipped or stopped by the participants due to excessive delay in getting the conrmation.
There were 18 such cases. Specically, there were 3, 7, and 8 occurrences with the box of cereal,
a canister of tea mix, and a box of fruit-and-grain bars (which have a smaller box compared to
the box of cereal), respectively. It was not clear how these might have been related to either the
product type or the feedback mode.
5.3 alitative Data Analysis: Interview
After the trials, the researcher conducted a semi-structured interview to learn and understand
about the participants’ overall experiences of the guidance provided. The interviews probed the
three phases of each task (scanning, nding, and conrming), as well as the general usefulness of
dierent components of the guiding interface. More specically we evaluated the usefulness and
helpfulness of the following user interface elements: (1) item location with distance and direction
information, (2) the dierent feedback modes, (3) status and error recovery features, and (4) the
conrmation phase. The participants were also invited to provide open-ended feedback about the
assistive experience that they had with AIGuide, with a focus on guidance, the user interaction
interface, and its use as smartphone application. In the following, we summarize general themes
drawn from the comments in the interviews (all were transcribed) as well as behavior notes we
made when reviewing the task videos.
5.3.1 Finding Phase. During the nding phase of the guidance, we observed how participants
interacted with their phones after hearing the rst instruction, i.e., “Please slowly move the camera
in front of you. I will tell you when I nd the item. We observed that some participants appeared
to become frustrated or uncertain when the scanning time was lengthy and some chose to restart
the process. We provided the user with a restart button that allowed them to easily restart the
process when they felt uncertain about scanning progress; this restart was not considered a guiding
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:18 S. Lee et al.
failure. Although participants appreciated the app-provided feedback about what is going on with
a “Scanning” message emitted at regular intervals, some participants later brought up these cases
as an app feature that needs to be improved. P9 shared what she thought during this lengthy
scanning process and made suggestions: “What my question would be [is] how long will it just
keep scanning until it can’t? Does that ever time out? Or does it ever tell you, you know, try a
dierent area? Should it time out after a certain like—on the tea, you know, we both knew it was
there, but if we didn’t and it was still scanning, I would have eventually wanted to—maybe to say,
‘Idiot, move, it’s not over here, look somewhere else.”’ P7 commented that directing the phone’s
camera to the right area will be real challenge in larger spaces.
5.3.2 Hand Navigation Phase. Participants also provided comments about the guiding phase,
i.e., the interval between the start navigation to the grasping of the object. They provided com-
ments about how helpful and useful the application was and some participants commented about
its eciency. P2 said, “Because it’s pretty quick, that’s good. Because people don’t want to spend
too long to nd something. P1 added, “It was responsive that it nds the items pretty quickly. . .
how detailed and specic it is. Detailed guidance with and indication of distance in terms of how
far away, out of the path, and other estimated distance and location information provided the par-
ticipants with positive experience. P4 said, “How far away it is, and what direction to kind of lean
towards. And the secondary, I think, part of the directions, is the haptic or the sound. Because that
tells you how, you know, you’re getting closer. P9 commented about signaling for correcting the
out-of-path: “It would correct you, like “okay, you’re too far to the left, go to the right. So, it does
self-correct. Or it corrects you when you put it that way. P7 felt guidance had a good combination
of information: “Good combination because you’ve got the rst direction, and the guiding, which
is, it’s two feet, you know, ve inches to the left.
5.3.3 Location Information Representation with Measurement. Eight of 10 participants found
the location information helpful and useful, especially at the beginning of the guiding phase. The
information helped them to generate a good estimate of where the target is and how much and
in what direction to move. P5, P9, and P10 said, “They were extremely helpful, “I love this, and
“It’s handy, I think that’s useful. P10 also elaborated on how helpful the information was: “Rather
than being surprised by direction, being like which side again? It gives enough detail that you can
snap yourself to attention as supposed to think your phone is jibber-jabbering. P8 shared how he
used the information: “When it said it was like 36 inches away and 14 degrees to the right, I knew
that I needed to angle the phone a little bit to the right, to get it at and lined up.
However, there were two participants who did not nd the information as useful. The reasons
they provided are related to the numerical representation of the distance and location. P7 said,
“Distance information was not useful for me. . . I didn’t really understand what it said. . . I’m fully
blind so, like, from a blind perspective it’s about sound to me. Similarly, P6 attributed problems
related to the space where the AIGuide app was used. P6 was quite close to the target object’s
location because of limited space for this participant’s set-up. However, P6 expressed appreciation
for the feature allowing change in measurement unit (imperial or metric). In addition, some partic-
ipants were not sure whether providing degree information was useful. P1 said, . . . The degrees.
That was little bit wishy-washy, like okay, degrees, what do they mean, like degree of tilt?” P2
said, “that part may not be as helpful, because people don’t. . . a lot of people aren’t gonna have
an accurate sense of degrees.
5.3.4 Feedback Types. AIGuide always provides its directions to user via speech. In addition, it
provides three feedback modes: sound only, haptic only, and combination of both sound and haptic.
We were particularly interested in comparing the eectiveness and preferences of these three types
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:19
of feedback, which the smartphone provided for the task of getting to an object and grabbing it.
All participants reported that the indication of how close or how far away the target is from the
phone and whether the target is in their camera view helpful and eective for all three feedback
modes. Augmenting the speech-based directions with sounds or haptic cues was experienced as
making the guidance responsive and adequate. Another common theme among participants was
the practicality and utility of haptic cues for situations in which a sound is not easy to hear, or
for someone who has both visual and hearing impairments. The more specic experiences and
preferences for each feedback mode are described in the following sections.
Sound-Only Feedback Mode. Only 2 of the 10 participants said they preferred the sound-only
option. P6 liked the sound feedback more than the haptic, because she is generally more aware
of auditory feedback than haptic. She also mentioned that she gets more notications from her
smartphone using sound than she gets using vibration. In her comparative comments, P6 also
provided a more abstract explanation: “Sound is more dened and haptic cannot be loud. She
said she liked the speech feedback (left, left, right, right) very much. Her performance data from
her trials underscores this preference, clearly showing that sound feedback works better for her.
Her averaged time with sound and haptic were 20.3 seconds and 66.3 seconds, respectively. P10
preferred the sound when considering the need for careful holding of a smartphone to receive
haptic feedback. He said, “Rather picking a comfortable grip and listen for a ding ding ding ding
ding ding. . . you don’t have to worry about your phone jumping out your hand at some points.
Haptic-Only Feedback Mode. Among 10 participants, three preferred haptic more than sound or
sound+haptic feedback. The main theme expressed in their comments was auditory information
overload; hearing is their primary channel for receiving information. P8 said, “I get tired of listen-
ing to—constantly having to listen. . . like speech feedback telling you left and right is okay, but
constantly listening to the beeping is annoying. P3 made similar comments: “Extra second needed
to think about what happened. . . haptic is easy. P2 elaborated on this aspect and said, “Haptic
gives you more feedback and more consistent. . . because the haptic you can also hear too. In con-
trast, she made following points regarding the sound feedback. “The sound, just the sound doesn’t.
. . doesn’t give you constant information, sound is aected by a lot of factors. . . surrounding noise,
volume, other types of audio info.
Combination of Sound and Haptic Mode. Concurrent sound and haptic feedback preferred by 5 of
10 participants. They said that this combination has the benet of not only oering more informa-
tion but also providing an alternative (backup) for cases when one sensory channel must be used
for other interaction (e.g., talking with a friend). P1 and P5 emphasized the mutual reinforcement
that two combination produced. P1 said, “Both of them combined maximized the potential to get
the object. P5 also said, “I always like great amount info as possible. . . I found it super helpful. P9
said, “Because you get it both ways, so you know if I’m not exactly paying attention. . . You’re just
not quite paying attention, like if you were on the phone talking to somebody. Interestingly, when
we compared these participants to those who preferred haptic-only or sound-only, and showed a
corresponding dierence in their task performance, the performance of participants who preferred
the combination better was similar across all three feedback modes. It may be that the test tasks
were not challenging enough to require the extra information of combined channels, but that the
combination was nonetheless able to create a more comfortable or condent performance.
5.3.5 Confirmation Phase. Most of participants did not nd the conrmation phase useful, but
rather found it confusing for the following reasons. First, they did not need conrmation of the
products in our experiment that involved a limited set of three distinctive products by shape and
size. P9 said, “Three totally dierent sizes and shapes, I didn’t have to use it as much. Second,
it takes a while to nish the conrmation process, because it involves a scanning process and
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:20 S. Lee et al.
requires the user to align and ensure sucient space between the camera and the object for the
system to be able to identify and conrm the target. P5 implied that the process was long with her
comment: “Quick conrm preferred but it was ok. Last, designing a gesture input interaction that
involved shaking the phone without feedback was not intuitive and created an extra step. P4 said,
“There wasn’t feedback that it was changing to conrming, so like, when you’re starting the guide,
you hit guide, and so my instinct would have been to conrm, like hit the button and then conrm.
The shaking is dierent for me, because I didn’t hear it say conrming or you know, anything like
that. So, there wasn’t any audible feedback on that part. Additionally, P8 said, “it doesn’t seem
like a natural thing to do if you’re using your camera to nd something to shake your phone.
5.4 alitative Data Analysis: Video Observation
We recorded the entire Zoom video session for each of the 10 participants with visual impairments
(30–50 minutes). As detailed earlier, the video recordings merged two dierent camera angles, al-
lowing us to observe the participants from two sides. Using these video data, we performed a
qualitative video analysis using heuristics suggested in Reference [65]. We transcribed both verbal
and visual elements of the videos, employing a sequential analysis approach with two units of anal-
ysis: the phase of guiding support (selecting, scanning, navigating, conrming); and the trials with
dierent products and dierent feedback type. The goal of the video analysis was to understand
in depth on how people with visual impairments use and interact with a smartphone particularly
in the context of the phone being used as a hand guidance assistant. In the following we present
our ndings from the iterative thematic analysis.
5.4.1 Hand-held Interaction: On-the-go Holding and Moving. In the early scanning phase, it ap-
peared that most participants made an eort to hold the phone as upright as possible. For example,
we noted that they often exed their wrists to hold the phone upright. In fact, most participants
used two hands to hold the phone when scanning, not with their arms extended, but with their two
arms held naturally by their side and the phone positioned at about the waist. They then slowly
and carefully moved the phone from one side to the other until notied that the target was found.
A few participants used a single hand to hold the phone, with the palm on the back of the phone
or over the screen (Figure 4). When the scanning took a while, these participants tended switch
to two hands, go back to one hand, then switch to the other hand. It might be an indication of
discomfort and tiredness caused by prolonged holding with a exed wrist. We also noted that par-
ticipants were careful to grip the phone such that their ngers and hand do not block the camera
or touch the screen (Figure 4).
During the guiding phase, when the participants were approaching the item while holding the
phone, many participants shifted from two hands to one hand for holding the phone, using the
other now-free hand for interacting with the app (e.g., selecting the Stop button) and touching and
conrming the products. Again some participants held the phone with the palm covering the back
of the phone while others held it with the palm over the screen. It appeared that participants made
a deliberate eort to keep the phone position upright. In fact, some participants did not even tip
the phone downward when they needed to tap the screen but instead tried to tap the screen while
it was in an upright position. Only one participant reoriented the phone horizontally to press the
Guide button. Many participants did begin to tilt the phone toward the end of the guiding process,
but they did not seem to notice that they had changed their grip.
In the interview, none of the participants mentioned any problems relating to how or how long
they had to hold the phone. However, the video observation suggested some level of discomfort
(e.g., exed wrist and changing hands), what would be a cause for such interaction behavior
(trying to sustain camera angle), and allowed us to connect some comments in the interview to the
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:21
Fig. 4. Participants hold the phone with one hand either with palm covering over the screen or the back of
the phone or two hands while on the move.
observed behaviors. In particular, two things were identied the need for supporting the phone
camera and for receiving the haptic feedback and these were enabled by the participants’ careful
movement as well as at times the unnatural coordination of their hands and arms. Interestingly,
P7 commented that holding the phone while walking is not good to do, because it increases the
chances of dropping it; such concerns could be an additional factor that can explain their cautious
behavior.
5.4.2 Touchscreen Interaction: Gesture and Shaking. AIGuide app provides a simple user inter-
face (two labels and four buttons) that can be navigated with the gesture-based screen reader,
VoiceOver. We observed no problems for participants interacting with this user interface, as they
used simple gestures such as touching, tapping, and swiping. We saw a pattern of using a thumb,
index nger, or the other free hand for interacting with the app. However, we did note that partic-
ipants who positioned their palms over the screen had to distort their hands to press the buttons
or use their other free hand to do it but with an inconvenient posture. This posture looked even
more uncomfortable when participants tried to maintain the phone in an upright position and at
the same time press the button with the other hand (Figure 5). A transcription of P10’s talk and
action in the video regarding this issue is as follows:
. . . to do the Conrm button as you’re holding the phone, you’re going to have to go
like this [straining to touch phone], and some people may not have the dexterity or the
arm ability to hold the phone in place while swiping this way. It felt, if you don’t swipe
exactly, the iPhone considers that you’re swiping up or down, other applications do
this too, if you don’t swipe up in the way it wants you to, it’ll change your speaking
rate as opposed to actually just making the ‘boop’ and then letting you know to swipe.
During the whole guiding process, participants may tap the button a few times. Specically, the
Guide/Stop button can be pressed to start the navigation process right after the scanning phase;
and the Conrm button can be tapped once the navigation phase is done. We noted that some par-
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:22 S. Lee et al.
Fig. 5. Postures of some participants’ interacting with phone while holding the phone upright, extra exten-
sion of arm and hand and finger distortion
ticipants tapped an incorrect button (e.g., tapping the Restart or Guide button instead of Conrm
button) and triggered an unintended process.
As an alternative option for conrming the product, participants could use a “shake-the-phone”
feature. Except for one participant, this shaking gesture seemed to be novel for participants. They
seemed unsure about how to do it properly, for example asking how aggressive the shaking should
be and which way the shaking move should be made (side-to-side or up-and-down motion; we ob-
served use of both directions). Although many participants did not believe the conrmation pro-
cess was necessary, and said during interviews that shaking the phone is not intuitive, we observed
that most participants chose to shake the phone rather than tap a button on the touchscreen. This
might be an indication that the feature helps making the interaction easy, and comments from P4
and P5 explains why and how.
P4 said, “It’s less stop-and-go, it’s always one hand, because I usually have stu: a cane, a dog,
or a bag. P5 commented from the concern of losing the found item with moving the phone away:
I think that shaking it would make it more accurate. Because if you tap on the Conrm
button, then you may have to move the phone and then have to go back and reposition
it. Whereas if you’re just shaking it then you’re already in front of the product. So
I probably myself would end up using that [the shaking], even though I know that
through the process I typically just tapped on the Conrm button. But if I was shopping
and wanted better accuracy, I would just shake it that way.
Interestingly, P8 did not like the shaking feature because of the same concern that P5 has: “It
doesn’t seem like a natural thing to do if you are using your camera to nd something to shake
your phone.
5.5 Mobile-based Nonvisual Hand Guidance Interaction: Design Considerations
In the smartphone camera-supported hand guidance interaction, being able to hold a phone with
one hand and leave the other hand free to interact with the phone is needed while walking toward
an object. We saw that this dual requirement (holding the phone for camera use but also inter-
acting with it) causes physical inconvenience for participants and made phone interactions more
challenging. If we add in the extra challenges of managing a cane and/or a guide dog leash, then
the challenges would likely become even worse. Furthermore, holding the phone in the hand for
haptic feedback makes holding an even more taxing task. These physical demands point to a need
for a user interaction design that enables one hand to both hold the device comfortably and interact
with it at the same time. We consider several approaches informed by the insights from our study.
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:23
5.5.1 Design for Supporting One-hand Interaction. To support one handed interaction, we sug-
gest that the need for touchscreen gesture input should be minimized and the process of interaction
simplied. We saw that at times participants used their other free hand when they needed to in-
teract with the phone (e.g., pressing the Restart button), especially when they were not on the
move. However, while walking, participants needed to use the other free hand to reach out and
touch the item, forcing them to use a single hand to both hold the phone and interact with the
app. This situation leaves the participants with the only options of using either the thumb or the
index nger and could result in an exertion of nger use and an uncomfortable motion. Moreover,
with either hand use, it takes time to nd and locate a wanted button and even then there is a
possibility to make an incorrect selection of the button (observed with our participants) due to the
inaccessible nature of the visual-based touchscreen interface for people with visual impairment’s
use—no tactile feedback and sensitivity of the touchscreen for accidental touching.
Removing the unnecessary button pressing interaction, supporting easy use of the thumb and
thumb range use interaction (for the case of using the phone held hand) or tapping anywhere
on the screen would be able to make the interaction easy and ecient process, and support the
one handheld interaction. Also, using the built-in accessibility feature in smartphones (e.g., Back
Tap in iPhone) is a good option to consider. It would be a convenient input method that does not
require any touchscreen interaction.
In addition, we suggest creating a non-touchscreen-based interaction interface that can save
an eort of screen navigation. The shake-the-phone feature in AIGuide is one example of this
and our observation shows that it supports one handed interaction. Our study participants used
the shaking gesture more frequently than pressing the screen Conrm button and shared their
experience of it being easier to use than trying to nd a button on the screen. Another possibility
is to use the volume button on the side of the phone as part of the interaction. P8 shared a use case
of utilizing the volume button for the case of taking a picture with a smartphone.
If you wanted something other than having to navigate on the screen and double-tap
it, you could do one of the volume buttons. Like how the camera can take a picture if
you use the volume buttons. I know there’s a hook for that in the VoiceOver API.
Last, voice/speech input supported interaction would be able to support one handheld interac-
tion. However, this approach may be limited when operation takes place in a noisy environment,
as discussed in these prior research [27,88]. Technologies for cancellation of surrounding noise or
detection of target voices might help to address this limitation. In addition, the use of bone con-
duction headset integrated microphone would be a good solution that can mitigate the concern
since it would not only support the voice-based interaction but also enable the user with visual
impairment to receive the spatial audio information and understand the surroundings.
5.5.2 Design for Comfortable and Relaxed Holding. From both interview study and video anal-
ysis, we found that the need to carefully grip the phone to support camera pointing, and to receive
haptic feedback, lead to muscle tension while holding the phone, potentially causing discomfort.
The study revealed that participants were concerned about the camera losing the target if they
moved the phone away from the general direction of the item. For example, this prevented them
from tilting the phone when using the touchscreen to tap (the camera would point to the oor).
This concern contributed to their rigid and constrained movement. In fact, AiGuide does not need
to have the object inside a frame at all times; once it detects an object it maintains its position. Using
visual-inertial odometry, it tracks the position of the item with respect to the phone in 3D space.
However, it seemed that our participants used AIGuide with a preconception of how the camera
should work without understanding about the advanced functionality its detection and guiding
system aords. This misconception may help to explain why one participant considered shaking
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:24 S. Lee et al.
not to be an intuitive action (presumably because it would disrupt the visual eld). P10’s comments
explain the phenomenon and suggests how to help the user by providing this information:
there should be an additional thing that says “you do not need to have the phone held
in this position while you select this, because my initial thought is if I move the phone
to try to click the button safely that I would lose the fact that it found it, when in truth,
that’s not the case. That’s why a few times when I did the test, I pulled the phone a
little bit away and held it comfortably when I double tapped on where you can actually
conrm it, and then I put it forward. I didn’t know that until I tested it.
The video analysis helped us to realize that we had overlooked the possible eect of the partici-
pants’ initial mental models about visual object recognition and associated guidance (i.e., keeping
the desired object “in sight”). At the same time, it emphasizes important contributions of AIGuide’s
AR technology that oers a more sophisticated level of camera-based interaction (able to maintain
the detected object’s position), thus allowing more exible holding of the phone.
Another condition that requires tight and careful grip of the phone is its role as the vibrotactile
producer of haptic feedback. The vibrational signals require contact with the body (the hand in
this case), whereas sound feedback does not require such contact. Some participants saw this re-
quirement as a constraint and thus preferred sound feedback over haptic feedback. To mitigate the
issue and improve the experience, four participants proposed the use of wrist wearables (such as a
wristband or Apple Watch), pairing them with the iPhone so that they can receive haptic feedback
on their wrist. P9 mentioned a commercial wristband product that employs vibrotactile signals
and is used for social distancing help for people with visual impairments. Some prior research also
investigated the use of wristband interfaced with the smartphone for enhancing the interaction
experience with the smartphone [27,47,88].
5.5.3 Additional Design Consideration: Complementary Features. Finally, one participant’s com-
plimentary comment about the independent speaking rate adjuster for speech output allows us to
gain insight into a design consideration regarding simultaneous use with a smartphone’s many
other functions. P10 described how useful the feature is with sharing his experience:
For example, I can be typing a text message, and I’ll hear the thing go “sending a
notication, and the notication will talk at the same speed rate as what I’m trying to
read, like a text message, so my text message will get overlapped by the notication.
With AIGuide, you don’t have to worry about that problem, because if the thing speaks
slower, you know it’s from AIGuide. If it speaks faster, it’s your phone giving you a
very vocal notication.
This example suggests that the smartphone-based application designer should consider any com-
plication and conict that could cause any disruption of use. This could create a negative experi-
ence that can lead to abandonment.
6 DISCUSSION
6.1 Performance versus Preference in Designing Feedback
Performance with the three dierent feedback modes suggested that the sound-only feedback
(speech instructions complemented by beeps) may be more eective than the other two options:
haptic only (speech + haptic) and combination of the two (speech + sound + haptic); recall however
that these dierences were not reliable enough to be statistically signicant. However, the sound-
only mode was preferred by only two of the 10 participants; this is reminiscent of the classic nding
that performance and preference are often not aligned in usability studies [11]. It is interesting that
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:25
participants’ performance seemed to be slowest in the case of haptic-only feedback even if many
participants perceived vibrations as an easy and quick cue that does not require more information
processing eort. This might imply that information delivery with dierent types of modality (hap-
tic + speech) puts more cognitive load than overload of information with the same types (sound +
speech). Also, it is a good design implication that points out the importance of considerations in
possible secondary interactions with surroundings.
6.2 Designing for Multimodal Feedback
It is interesting to note that we used two dierent types of auditory representation (sound and
speech) for two dierent types of information—sound for distance cue and speech for directional
cue. Moreover, the information is not provided at the same time but at dierent times in a continu-
ous fashion. This way of presenting information might reduce the cognitive overload and even fa-
cilitate the information processing even if both are auditory representation. This might be a reason
that participants described the guidance from AIGuide as responsive, detailed, specic, even power-
ful.Leeetal.[44] investigated the eectiveness of multimodal feedback and their nding provides
an interesting comparison with our study and supports this interpretation. In their study, multi-
modal feedback that combined speech and haptic cues was both most preferred and most eective
when guiding small-scale hand movements. The feedback design in the two studies is dierent in
that two dierent types of modality (speech and haptic) are used for the same information (direc-
tional guidance) in a simultaneous channel in their study, but used for dierent information (one
for distance and the other for direction) in this study. Another distinction is that both the haptic and
the speech were signaled at the same time. However, our study found that speech and sound combi-
nation may be most eective. This might be an indication that the same type of information deliv-
ery medium (speech and sound) could be eective as long as they represent dierent information.
6.3 Designing for Mobile Devices
All the participants showed a predilection for the smartphone assistive technology in spite of the
need to hold the phone and thereby occupy one hand. Our AIGuide app that supports object recog-
nition and detection may cause particular inconvenience for users, because they need to hold the
phone at a certain angle for the built-in camera of the phone to point to the right spot. This could
cause an uncomfortable grip and awkward nger placement. Despite this, the smartphone assistive
application was mostly welcomed and appreciated; our participants with visual impairments are
willing to work around the inconveniences and use the application. When P8 was asked if there
is anything else that could be used instead of a smartphone, he answered, “I don’t think there’s
anything other than a smartphone you could use. The numerous existing examples of smartphone
assistive technology clearly support this trend. Participants explained that these preferences relate
to portability and a lack of stigmatism. Their ability to use mainstream technology that sighted
people use and avoiding specialized technology, which could bring uncomfortable attention, plays
a major role that makes smartphone app as a go-to option. P6 said, “So many people use it now.
Furthermore, our video data analysis and the comment from P1 suggest that the hand maneu-
vering that the users can have in using the smartphone makes the smartphone the most ideal form
factor for the hand guidance aid technology. P1 compared with the glasses and made interesting
comments:
Idontthinkwearingthingsonyourheadisaveryusefulthing....becausejustmy
head is pointed at something doesn’t help me nd it. There’s this idea, so many people
have this idea, that having glasses is useful because it means that—it looks like you’re
looking at the right thing, but that’s not really useful. Cause you’re not touching it;
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:26 S. Lee et al.
you haven’t really found it—it’s not helping you nd your way to it. It’s just that you
arepointingyourheadatit,whichifyoucantseeisawasteoftime....wecouldrun
it on Google Glass, it might work, but—and you might be able to get a hold of Google
Glass, but it’s not really a product that we can get our hands on and play with.
However, at the same time, inconvenience and challenges were mentioned and limitations of
phone-based interfaces were noted. Some participants suggested the use of wearables such a wrist-
watch,abracelet,andsmartglassesoranypairupwithaphone.P9said,“...iftherewassomething
that had a camera pointed forward over your wrist, then you could just have it like that, that might
be useful.
6.4 Designing for Non-visual Hand Guidance
The evaluation of the AIGuide app with participants with visual impairments allowed us to eval-
uate the guidance features that we had designed, identify additional needs and challenges, and
obtain valuable insights into how to better provide non-visual hand guidance. Currently, the AIGu-
ide app provides status information to help the user understand the situation and condition, target
location information to help the user with orientation, error correction to help the user be in the
desired path, and conrmation to help the user arm object acquisition. Our study results and
qualitative analysis of both the interview and the video of the show that all of these features pro-
vided intended assistance and contributed to the positive and rewarding experiences.
6.4.1 Improvements for Scanning and Navigating. Our evaluation also helped us to nd gaps
in the design of our guidance system. One issue concerns status information: Most participants
appreciated status updates (e.g., what the system is doing, how far or close an object is from the
phone camera). However, AIGuide oers no status information about whether the phone is fac-
ing to the back or facing downward, which is also important for camera-based phone interaction.
Yoon et al. [90] also discussed the issue found in their research. During the study, an experimenter
provided this information and corrected errors when needed, for instance when camera orienta-
tion posed a signicant interruption. Furthermore, we see a need for guidance that would help
participants to orient themselves relative to the position of the target object. When the scanning
phase was prolonged, we noted that participants’ extended panning behavior (trying to cover as
much space as possible) sometimes caused them to turn away from the target area and lose their
sense of orientation.
We also found that descriptive information about where the target is (distance, angle, and below
or above reference to the phone camera position) could be eective in helping participants orient
themselves to the target item and prepare to move the camera toward the item. To make this
location information more helpful, participants suggested using a clock face location in addition
to the degree indicator, because they are familiar with this representation from other common
navigation aids (e.g., BlindSquare) and their Orientation Mobility training. This suggestion echos
the research of Chen et al. [21] that found a clock face representation for location to be helpful.
6.4.2 Improving the Confirmation Phase. Although participants did not nd the conrmation
phase to be needed in our experiment, they agreed that it is a necessary step in general. However,
most participants found the conrmation process in AIGuide to be confusing. It seemed that their
usual actions in conrming an object with a camera did not map well to what AIGuide needed
for the process of conrmation. For the conrmation step, AiGuide re-detects the object. Because
object detection is done through ARKit, AiGuide needs to capture enough feature points from
the object to conrm. Feature point collection depends on two things: movement of the camera
and amount of object inside the camera view. If the camera is too close to the item, ARKit cannot
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:27
generate enough feature points to match the saved model. Additionally, if there is no movement,
then ARKit has trouble generating feature points, because it relies in visual intertial odometry.
Thus the camera and the product should not be too close each other but need to be apart to create
enough space.
However, we observed that all participants positioned the camera directly in front of the prod-
uct when asked to start the conrmation action; they then added some movement back and forth
when they realized AIGuide needed more information. We believe we can improve this conrma-
tion process by better matching what the participants naturally want to do. For example, we can
investigate the feasibility of an automated process that uses built-in camera zoom functionality to
vary distance and space and obtain the needed view of the product image.
6.5 Generalization and Scalability
In our user study, we evaluated the functionality of AIGuide app as an assistive technology for
helping people with visual impairments to nd an object and pick it up. We used three grocery
products as target items within scenarios of identifying purchased items at home. After experi-
encing a generally high accuracy in recognizing a target item and guiding the hand to it, our
participants expressed excitement with the possibility that AIGuide could be generalized to other
situations, and shared the cases that they can envision. The most common task was to use it for
identifying similar objects such as clothes, medication, beauty and health related products, clean-
ing products, and similar shape of products. P2 said, “Anything that can help you nd something is
good. P1 excitedly listed the many cases for which the AIGuide would be very useful: “I would say
grocery shopping, clothes shopping, helping to locate objects around the house. I don’t know if
there could be a library for like keys or shoes, like if you’ve lost something or misplaced an object
like a cup. You don’t know what you did with your cup. . . this is actually telling you that an object
is there and guiding you to that object, so that would really expand the usability and functionality
of an app like this. A lot of apps tell you what’s around, but they don’t guide you to what’s around.
7 LIMITATIONS AND FUTURE WORK
A limitation of our study is our modest rigor in performance data; we were not able to fully con-
trol the experiment, because it was conducted remotely in the participant’s home space, and the
resulting variability may have contributed to our inability to detect statistically reliable perfor-
mance dierences across feedback modes. The current pandemic situation forced us to redesign
our study work operate in a remote and distributed situation. However, we viewed this opportu-
nity as a challenge and have documented our approach to serve as guidance for others. Another
limitation we attribute to the need for a remote experimental setting was the simplied arrange-
ment of the target items; we organized the task in this way to ease the process of remote setup,
particularly for participants with visual impairment. However, this feature reduces the study’s eco-
logical validity, because such arrangements are likely not typical in the real world (i.e., we assume
that items are usually untidily organized with much more clutter).
A limitation of our current system is that it detects specic items, which have feature-point-
scans preloaded into the application. The current version of AIGuide might be appropriate as a
grocery store application in the stage of identifying the product and acquiring it or to nd known
items around the house. However, the problem of nding generic objects like a shoe, apple, and
bottle, has not been addressed yet. To overcome this limitation, we plan to incorporate a machine
learning model to perform 3D object detection of generic objects. The modular design of our ap-
plication will allow us to swap the current object detection module with an improved version and
also with the arrival of new depth-sensing technologies such as LiDAR scanners and time-of-ight-
cameras in consumer mobile devices [55,73].
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:28 S. Lee et al.
To compensate for this limitation in this study, we selected objects that varied in size and shape,
and evaluated the performance of the AIGuide for the characteristics of a size and a shape. How-
ever, the specic selections we made might lower our ecological validity, because the target items
were chosen to be distinguishable based on tactile feedback available through a hand touch.
Recall that in case of failures in object recognition and localization, we provided error recovery
and repair processes to the participants. However, only one participant triggered the error condi-
tion and experienced the error recovery interaction. We assume that the relatively simple study
setup and easily detectable objects from the small set of available item selections did not create a
dicult task for the system and thus the recovery function was not initiated. In future studies, we
plan to extend the tasks to more complex real-world scenes, where we expect to observe system er-
rors and related error repair. We are also investigating the potential for a feature that would allow
people with visual impairments to train the application to detect personal objects. Finally, inspired
by the conversational interaction with a remote sighted assistance through a mobile-based appli-
cation, we are planning to release a version of AIGuide with a conversational interface powered
by natural language processing technology.
8 CONCLUSION
We presented AIGuide, a smartphone application designed as a visual prosthetic to help people
with visual impairments to not just locate but also pick objects in their surroundings. It leverages
the ARKit framework for localization and object detection in real time, without the need to have
the objects inside the camera view at all times. It guides the user’s hand with auditory and haptic
feedback. It does not use external sensors and does not need an internet or wireless connection. We
performed a user study with 10 participants with visual impairments and evaluated AIGuide for its
ability to guide users to grasp objects. The user study showed that AIGuide successfully nds a tar-
get object and guides the hand to the object for people with visual impairments in a quick manner.
We found that the guidance features of AIGuide were eective and ecient for the task of hand
to object guidance. Additionally we analyzed video recordings of the user sessions to understand
the users’ experience of smartphone interaction when it is used for nonvisual hand guidance. This
additional analysis revealed the need for interaction designs that are one-handed, as well as design
considerations that can support such interaction and enhance the guidance provided by AIGuide.
In future work, we are planning to replace the detection algorithm with one that will allow AIGu-
ide to detect more generic objects; implement a conversational interface and explore other options
for nonvisual interaction; and include the ability to train AIGuide to detect personal objects.
ACKNOWLEDGMENTS
We thank our participants with visual impairments for participating in the user study. We thank
Daniel Yi, Rachel Bartuska, Drew Ronk, Justin Tovar, Julia Jablonski, and Madison Reddie for their
assistance in data analysis.
REFERENCES
[1] [n.d.]. Accessibility—Vision. Retrieved from https://www.apple.com/accessibility/vision/.
[2] [n.d.]. TalkBack—Android Accessibility Help. Retrieved from https://support.google.com/accessibility/android/topic/
3529932?hl=en&ref_topic=9078845.
[3] Dragan Ahmetovic, Federico Avanzini, Adriano Baratè, Cristian Bernareggi, Gabriele Galimberti, Luca A. Ludovico,
Sergio Mascetti, and Giorgio Presti. 2019. Sonication of rotation instructions to support navigation of people with
visual impairment. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications
(PerCom’19). 1–10. https://doi.org/10.1109/PERCOM.2019.8767407
[4] Dragan Ahmetovic, Daisuke Sato, Uran Oh, Tatsuya Ishihara, Kris Kitani, and Chieko Asakawa. 2020. Recog: Support-
ing blind people in recognizing personal objects. In Proceedings of the CHI Conference on Human Factors in Computing
Systems. 1–12.
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:29
[5] Aipoly [n.d.]. Retreived May 1, 2020 from https://www.aipoly.com/.
[6] Aira [n.d.]. Retrieved May 1, 2020 from https://aira.io/.
[7] ARAnchor [n.d.]. Retrieved May 7, 2020 from https://developer.apple.com/documentation/arkit/aranchor.
[8] Mauro Avila Soto, Markus Funk, Matthias Hoppe, Robin Boldt, Katrin Wolf, and Niels Henze. 2017. Dronenaviga-
tor: Using leashed and free-oating quadcopters to navigate visually impaired travelers. In Proceedings of the 19th
International ACM Sigaccess Conference on Computers and Accessibility. 300–304.
[9] Shiri Azenkot, R. E. Ladner, and Jacob Wobbrock. 2011. Smartphone haptic feedback for nonvisual waynding. In
Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS’11).https:
//doi.org/10.1145/2049536.2049607
[10] Shiri Azenkot, Richard E. Ladner, and Jacob O. Wobbrock. 2011. Smartphone haptic feedback for nonvisual waynd-
ing. In Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility. 281–282.
[11] Robert W. Bailey. 1993. Performance vs. preference. In Proceedings of the Human Factors and Ergonomics Society
Annual Meeting, Vol. 37. SAGE Publications, Los Angeles, CA, 282–286.
[12] Gabriel Baud-Bovy, Lope Ben Porquis, Fabio Ancarani, and Monica Gori. 2017. ABBI: A wearable device for improv-
ing spatial cognition in visually-impaired children. In IEEE Biomedical Circuits and Systems Conference (BioCAS’17).
1–4.
[13] Serge Belongie. 2007. Project GroZi: Assistive navigational technology for the visually impaired. J. Vis. 7, 15 (2007),
37–37.
[14] BeMyEyes. Retreived May 1, 2020 from https://www.bemyeyes.com/.
[15] Jerey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey
Tatarowicz, Brandyn White, Samual White, et al. 2010. Vizwiz: Nearly real-time answers to visual questions. In
Proceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology. 333–342.
[16] Mike Boland. 2019. ARCore Reaches 400 Million Devices. Retrieved from arinsider.co/2019/05/13/arcore-reaches-
400-million- devices/.
[17] Roger Boldu, Alexandru Dancu, Denys J. C. Matthies, Thisum Buddhika, Shamane Siriwardhana, and Suranga
Nanayakkara. 2018. Fingerreader2. 0: Designing and evaluating a wearable nger-worn camera to assist people with
visual impairments while shopping. Proc. ACM Interact. Mobile Wear. Ubiq. Technol. 2, 3 (2018), 1–19.
[18] Said Boularouk, Didier Josselin, and Eitan Altman. 2017. Open source tools for locomotion and apprehension of
space by visually impaired persons: Some propositions to build a prototype based on Arduino, speech recognition
and OpenStreetMap. In Societal Geo-Innovation.
[19] Anke Brock, Slim Kammoun, Marc Macé, and Christophe Jourais. 2014. Using wrist vibrations to guide hand move-
ment and whole body navigation. i-com 13, 3 (2014), 19–28.
[20] Hsuan-Eng Chen, Yi-Ying Lin, Chien-Hsing Chen, and I-Fang Wang. 2015. BlindNavi. In Proceedings of the 33rd
Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems.ACM.https://doi.org/10.1145/
2702613.2726953
[21] Hsuan-Eng Chen, Yi-Ying Lin, Chien-Hsing Chen, and I-Fang Wang. 2015. BlindNavi: A navigation app for the
visually impaired smartphone user. In Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human
Factors in Computing Systems. 19–24.
[22] Akansel Cosgun, E. Akin Sisbot, and Henrik I. Christensen. 2014. Evaluation of rotational and directional vibration
patterns on a tactile belt for guiding visually impaired people. In Proceedings of the IEEE Haptics Symposium (HAP-
TICS’14). IEEE, 367–370.
[23] Koen Crommentuijn and Fredrik Winberg. 2006. Designing auditory displays to facilitate object localization in vir-
tual haptic 3D environments. In Proceedings of the 8th International ACM SIGACCESS Conference on Computers and
Accessibility. 255–256.
[24] Florian Dramas, Bernard Oriola, Brian G. Katz, Simon J. Thorpe, and Christophe Jourais. 2008. Designing an as-
sistive device for the blind based on object localization and augmented auditory reality. In Proceedings of the 10th
International ACM SIGACCESS Conference on Computers and Accessibility. 263–264.
[25] Giuseppe Ghiani, Barbara Leporini, and Fabio Paternò. 2009. Vibrotactile feedback to aid blind users of mobile guides.
J. Vis. Lang. Comput. 20, 5 (10 2009), 305–317. https://doi.org/10.1016/j.jvlc.2009.07.004
[26] William Grussenmeyer and eelke folmer. 2017. Accessible touchscreen technology for people with visual impair-
ments: A survey. ACM Trans. Access. Comput. 9 (1 2017), Article 6. https://doi.org/10.1145/3022701
[27] William Grussenmeyer and Eelke Folmer. 2017. Accessible touchscreen technology for people with visual impair-
ments: A survey. ACM Trans. Access. Comput. 9, 2 (2017), 1–31.
[28] Anhong Guo, Xiang’Anthony’ Chen, Haoran Qi, Samuel White, Suman Ghosh, Chieko Asakawa, and Jerey P.
Bigham. 2016. Vizlens: A robust and interactive screen reader for interfaces in the real world. In Proceedings of
the 29th Annual Symposium on User Interface Software and Technology. 651–664.
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:30 S. Lee et al.
[29] Anhong Guo, Saige McVea, Xu Wang, Patrick Clary, Ken Goldman, Yang Li, Yu Zhong, and Jerey P. Bigham. 2018.
Investigating cursor-based interactions to support non-visual exploration in the real world. In Proceedings of the 20th
International ACM SIGACCESS Conference on Computers and Accessibility. 3–14.
[30] Wilko Heuten, Niels Henze, Susanne Boll, and Martin Pielot. 2008. Tactile waynder: A non-visual support system for
waynding. In Proceedings of the 5th Nordic Conference on Human-Computer Interaction: Building Bridges. 172–181.
[31] Jonggi Hong, Alisha Pradhan, Jon E. Froehlich, and Leah Findlater. 2017. Evaluating wrist-based haptic feedback for
non-visual target nding and path tracing on a 2d surface. In Proceedings of the 19th International ACM SIGACCESS
Conference on Computers and Accessibility. 210–219.
[32] Felix Huppert, Gerold Hoelzl, and Matthias Kranz. 2021. GuideCopter—A precise drone-based haptic guidance in-
terface for blind or visually impaired people. In Proceedings of the CHI Conference on Human Factors in Computing
Systems. 1–14.
[33] Implementing an iOS Settings Bundle. Retreived May 5, 2020 from https://developer.apple.com/library/archive/
documentation/Cocoa/Conceptual/UserDefaults/Preferences/Preferences.html.
[34] Chandrika Jayant, Hanjie Ji, Samuel White, and Jerey P. Bigham. 2011. Supporting blind photography. In Proceedings
of the 13th International ACM SIGACCESS Conference on Computers and Accessibility. 203–210.
[35] Hernisa Kacorri, Kris M. Kitani, Jerey P. Bigham, and Chieko Asakawa. 2017. People with visual impairment training
personal object recognizers: Feasibility and challenges. In Proceedings of the CHI Conference on Human Factors in
Computing Systems. 5839–5849.
[36] Slim Kammoun, Christophe Jourais, Tiago Guerreiro, Hugo Nicolau, and Joaquim Jorge. 2012. Guiding blind people
with haptic feedback. In Proceedings of Frontiers in Accessibility for Pervasive Computing (Pervasive’12),Vol.3.
[37] Shaun K. Kane, Brian Frey, and Jacob O. Wobbrock. 2013. Access lens: A gesture-based screen reader for real-world
documents. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 347–350.
[38] Shaun K. Kane, Meredith Ringel Morris, Annuska Z. Perkins, Daniel Wigdor, Richard E. Ladner, and Jacob O. Wob-
brock. 2011. Access overlays: Improving non-visual access to large touch screens for blind users. In Proceedings of
the 24th Annual ACM Symposium on User Interface Software and Technology. 273–282.
[39] Vinitha Khambadkar and Eelke Folmer. 2013. GIST: A gestural interface for remote nonvisual spatial perception. In
Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology. 301–310.
[40] Jee-Eun Kim, Masahiro Bessho, Shinsuke Kobayashi, Noboru Koshizuka, and Ken Sakamura. 2016. Navigating visu-
ally impaired travelers in a large train station using smartphone and bluetooth low energy. In Proceedings of the 31st
Annual ACM Symposium on Applied Computing. 604–611. https://doi.org/10.1145/2851613.2851716
[41] Kibum Kim, Xiangshi Ren, Seungmoon Choi, and Hong Z. Tan. 2016. Assisting people with visual impairments in
aiming at a target on a large wall-mounted display. Int. J. Hum.-Comput. Stud. 86 (2016), 109–120.
[42] KNFB Reader [n.d.]. Retrieved May 1, 2020 from https://www.knfbreader.com/.
[43] Kyungjun Lee, Jonggi Hong, Simone Pimento, Ebrima Jarjue, and Hernisa Kacorri. 2019. Revisiting blind photography
in the context of teachable object recognizers. In Proceedings of the 21st International ACM SIGACCESS Conference
on Computers and Accessibility. 83–95.
[44] Sooyeon Lee, Chien Wen Yuan, Benjamin V. Hanrahan, Mary Beth Rosson, and John M. Carroll. 2017. Reaching out:
Investigating dierent modalities to help people with visual impairments acquire items. In Proceedings of the 19th
International ACM SIGACCESS Conference on Computers and Accessibility. 389–390.
[45] Xiaoping Liu, He Zhang, Lingqiu Jin, and Cang Ye. 2018. A wearable robotic object manipulation aid for the visually
impaired. In Proceedings of the IEEE 1st International Conference on Micro/Nano Sensors for AI, Healthcare, and Robotics
(NSENS’18). IEEE, 5–9.
[46] Adriano Mancini, Emanuele Frontoni, and Primo Zingaretti. 2018. Mechatronic system to help visually impaired
users during walking and running. IEEE Trans. Intell. Transport. Syst. 19, 2 (2018), 649–660.
[47] Roberto Manduchi. 2012. Mobile vision as assistive technology for the blind: An experimental study. In Computers
Helping People with Special Needs, Klaus Miesenberger, Arthur Karshmer, Petr Penaz, and Wolfgang Zagler (Eds.).
Springer, Berlin, 9–16.
[48] Roberto Manduchi and James M. Coughlan. 2014. The last meter: Blind visual guidance to a target. In Proceedings of
the SIGCHI Conference on Human Factors in Computing Systems. 3113–3122.
[49] Roberto Manduchi, Sri Kurniawan, and Homayoun Bagherinia. 2010. Blind guidance using mobile computer vision:
A usability study. In Proceedings of the 12th International ACM SIGACCESS Conference on Computers and Accessibility.
241–242.
[50] Lot B. Merabet and Jaime Sánchez. 2016. Development of an audio-haptic virtual interface for navigation of large-
scale environments for people who are blind. In Proceedings of the International Conference on Universal Access in
Human-Computer Interaction. Springer, 595–606.
[51] John Morris and James Mueller. 2014. Blind and deaf consumer preferences for android and iOS smartphones. In
Inclusive Designing. Springer, 69–79.
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
AIGuide: Augmented Reality Hand Guidance in a Visual Prosthetic 12:31
[52] Cecily Morrison, Edward Cutrell, Martin Grayson, Anja Thieme, Alex Taylor, Geert Roumen, Camilla Longden,
Sebastian Tschiatschek, Rita Faia Marques, and Abigail Sellen. 2021. Social Sensemaking with AI: Designing an
open-ended AI experience with a blind child. In Proceedings of the CHI Conference on Human Factors in Computing
Systems. 1–14.
[53] Navilens [n.d.]. Retrieved May 1, 2020 from https://www.navilens.com/.
[54] Leo Neat, Ren Peng, Siyang Qin, and Roberto Manduchi. 2019. Scene text access: A comparison of mobile OCR
modalities for blind users. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 197–207.
[55] Apple Newsroom. [n.d.]. Apple Unveils New IPad Pro with LiDAR Scanner and Trackpad Support in IPadOS. Re-
trieved from www.apple.com/newsroom/2020/03/apple-unveils-new-ipad-pro- with-lidar- scanner-and-trackpad-
support-in- ipados/.
[56] John Nicholson, Vladimir Kulyukin, and Daniel Coster. 2009. ShopTalk: Independent blind shopping through verbal
route directions and barcode scans. Open Rehabil. J. 2, 1 (2009).
[57] Object Recognition [n.d.]. Retrieved May 4, 2020 from https://library.vuforia.com/articles/Training/Object-
Recognition.
[58] Uran Oh, Shaun Kane, and Leah Findlater. 2013. Follow that sound: Using sonication and corrective verbal feedback
to teach touchscreen gestures. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and
Accessibility (ASSETS’13).https://doi.org/10.1145/2513383.2513455
[59] Uran Oh, Shaun K. Kane, and Leah Findlater. 2013. Follow that sound: Using sonication and corrective verbal feed-
back to teach touchscreen gestures. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers
and Accessibility.18.
[60] Eshed OhnBar, Kris Kitani, and Chieko Asakawa. 2018. Personalized dynamics models for adaptive assistive naviga-
tion systems. In Conference on Robot Learning. PMLR, 16–39.
[61] Sabrina Paneels, Margarita Anastassova, Steven Strachan, Sophie Pham Van, Saranya Sivacoumarane, and Christian
Bolzmacher. 2013. What’s around me? Multi-actuator haptic feedback on the wrist. In Proceedings of the World Haptics
Conference (WHC’13). IEEE, 407–412.
[62] Gaëtan Parseihian, Charles Gondre, Mitsuko Aramaki, Sølvi Ystad, and Richard Kronland-Martinet. 2016. Compari-
son and evaluation of sonication strategies for guidance tasks. IEEE Trans. Multimedia 18, 4 (2016), 674–686.
[63] Martin Pielot, Benjamin Poppinga, Wilko Heuten, and Susanne Boll. 2011. A tactile compass for eyes-free pedestrian
navigation, In IFIP Conference on Human-Computer Interaction. Springer, Berlin, Heidelberg, 640–656.
[64] Giorgio Presti, Dragan Ahmetovic, Mattia Ducci, Cristian Bernareggi, Luca Ludovico, Adriano Baratè, Federico
Avanzini, and Sergio Mascetti. 2019. WatchOut: Obstacle sonication for people with visual impairment or blind-
ness. In Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility. 402–413.
[65] Kay E. Ramey, Dionne N. Champion, Elizabeth B. Dyer, Danielle T. Keifert, Christina Krist, Peter Meyerho, Krystal
Villanosa, and Jaakko Hilppö. 2016. Qualitative analysis of video data: Standards and heuristics. In Transforming
Learning, Empowering Learners: The International Conference of the Learning Sciences,C.K.Looi,J.L.Polman,U.
Cress, and P. Reimann.
[66] David A. Ross and Bruce B. Blasch. 2000. Wearable interfaces for orientation and waynding. In Proceedings of the
4th International ACM Conference on Assistive Technologies. 193–200.
[67] Daisuke Sato, Uran Oh, Kakuya Naito, Hironobu Takagi, Kris Kitani, and Chieko Asakawa. 2017. Navcog3: An eval-
uation of a smartphone-based blind indoor navigation assistant with semantic features in a large-scale environment.
In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility. 270–279.
[68] Shantanu A. Satpute, Janet R. Canady, Roberta L. Klatzky, and George D. Stetten. 2019. FingerSight: A vibrotactile
wearable ring for assistance with locating and reaching objects in peripersonal space. IEEE Trans. Hapt. 13, 2 (2019),
325–333.
[69] Scanning and Detecting 3D Object [n.d.]. Retrieved May 4, 2020 from https://developer.apple.com/documentation/
arkit/scanning_and_detecting_3d_objects.
[70] Seeing AI [n.d.]. Retrieved May 1, 2020 from https://www.microsoft.com/en-us/ai/seeing-ai.
[71] Chirayu Shah, Mourad Bouzit, Meriam Youssef, and Leslie Vasquez. 2006. Evaluation of RU-netra-tactile feedback
navigation system for the visually impaired. In Proceedings of the International Workshop on Virtual Rehabilitation.
IEEE, 72–77.
[72] Roy Shilkrot, Jochen Huber, Wong Meng Ee, Pattie Maes, and Suranga Chandima Nanayakkara. 2015. FingerReader:
A wearable device to explore printed text on the go. In Proceedings of the 33rd Annual ACM Conference on Human
Factors in Computing Systems. 2363–2372.
[73] The Ocial Samsung Galaxy Site. [n.d.]. What Is ToF Camera Technology on Galaxy and How Does It Work? Re-
trieved from www.samsung.com/global/galaxy/what-is/tof-camera/.
[74] Soundscape [n.d.]. Retrieved May 1, 2020 from https://www.microsoft.com/en-us/research/product/soundscape/.
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
12:32 S. Lee et al.
[75] Andrii Soviak, Anatoliy Borodin, Vikas Ashok, Yevgen Borodin, Yury Puzis, and I. V. Ramakrishnan. 2016. Tactile
accessibility: Does anyone need a haptic glove? In Proceedings of the 18th International ACM SIGACCESS Conference
on Computers and Accessibility. 101–109.
[76] Akshaya Kesarimangalam Srinivasan, Shwetha Sridharan, and Rajeswari Sridhar. 2020. Object localization and navi-
gation assistant for the visually challenged. In Proceedings of the 4th International Conference on Computing Method-
ologies and Communication (ICCMC’20). IEEE, 324–328.
[77] Lee Stearns, Victor DeSouza, Jessica Yin, Leah Findlater, and Jon E. Froehlich. 2017. Augmented reality magnication
for low vision users with the microsoft hololens and a nger-worn camera. In Proceedings of the 19th International
ACM SIGACCESS Conference on Computers and Accessibility. 361–362.
[78] Lee Stearns, Ruofei Du, Uran Oh, Catherine Jou, Leah Findlater, David A. Ross, and Jon E. Froehlich. 2016. Evaluating
haptic and auditory directional guidance to assist blind people in reading printed text using nger-mounted cameras.
ACM Trans. Access. Comput. 9, 1 (2016), 1–38.
[79] Kaveri Thakoor, Nii Mante, Carey Zhang, Christian Siagian, James Weiland, Laurent Itti, and Gérard Medioni. 2014. A
system for assisting the visually impaired in localization and grasp of desired objects. In Proceedings of the European
Conference on Computer Vision. Springer, 643–657.
[80] Understanding ARKit Tracking and Detection [n.d.]. Retrieved from May 4, 2020 https://developer.apple.com/videos/
play/wwdc2018/610.
[81] G. Vanderheiden. 1996. Use of audio-haptic interface techniques to allow nonvisual access to touchscreen appliances.
Proceedings of the Human Factors and Ergonomics Society Annual Meeting 40, 24 (1996), 1266–1266.
[82] Marynel Vázquez and Aaron Steinfeld. 2012. Helping visually impaired users properly aim a camera. In Proceedings
of the 14th International ACM SIGACCESS Conference on Computers and Accessibility. 95–102.
[83] Ramiro Velázquez, Edwige Pissaloux, and Aimé Lay-Ekuakille. 2015. Tactile-foot Stimulation can Assist the Naviga-
tion of People with Visual Impairment. Appl. Bionics Biomechan. (2015).
[84] Andreas Wachaja, Pratik Agarwal, Mathias Zink, Miguel Reyes Adame, Knut Möller, and Wolfram Burgard. 2015.
Navigating blind people with a smart walker. In Proceedings of the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS’15). IEEE, 6014–6019.
[85] Hsueh-Cheng Wang, Robert K. Katzschmann, Santani Teng, Brandon Araki, Laura Giarré, and Daniela Rus. 2017.
Enabling independent navigation for visually impaired people through a wearable vision-based feedback system. In
Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’17). IEEE, 6533–6540.
[86] Graham Wilson and Stephen A. Brewster. 2016. Using dynamic audio feedback to support peripersonal reaching in
young visually impaired people. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers
and Accessibility. 209–218.
[87] Koji Yatani, Nikola Banovic, and Khai Truong. 2012. SpaceSense: Representing geographical information to visually
impaired people using spatial tactile feedback. In Proceedings of the Conference on Human Factors in Computing
Systems.https://doi.org/10.1145/2207676.2207734
[88] Hanlu Ye, Meethu Malu, Uran Oh, and Leah Findlater. 2014. Current and Future Mobile and Wearable Device use by
People with Visual Impairments. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.
3123–3132.
[89] Chris Yoon, Ryan Louie, J. Ryan, MinhKhang Vu, Hyegi Bang, William Derksen, and P. Ruvolo. 2019. Leveraging
augmented reality to create apps for people with visual disabilities: A case study in indoor navigation. In Proceedings
of the 21st International ACM SIGACCESS Conference on Computers and Accessibility.
[90] Chris Yoon, Ryan Louie, Jeremy Ryan, MinhKhang Vu, Hyegi Bang, William Derksen, and Paul Ruvolo. 2019. Lever-
aging augmented reality to create apps for people with visual disabilities: A case study in indoor navigation. In
Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility. 210–221.
[91] John S. Zelek, Sam Bromley, Daniel Asmar, and David Thompson. 2003. A haptic glove as a tactile-vision sensory
substitution for waynding. J. Vis. Impair. Blindness 97, 10 (2003), 621–632.
[92] Yuhang Zhao, Elizabeth Kupferstein, Hathaitorn Rojnirun, Leah Findlater, and Shiri Azenkot. 2020. The eectiveness
of visual and audio waynding guidance on smartglasses for people with low vision. In Proceedings of the CHI
Conference on Human Factors in Computing Systems. 1–14.
[93] Yu Zhong, Pierre J. Garrigues, and Jerey P. Bigham. 2013. Real time object scanning using a mobile phone and
cloud-based visual search engine. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers
and Accessibility.18.
[94] Peter A. Zientara, Sooyeon Lee, Gus H. Smith, Rorry Brenner, Laurent Itti, Mary B. Rosson, John M. Carroll, Kevin M.
Irick, and Vijaykrishnan Narayanan. 2017. Third eye: A shopping assistant for the visually impaired. Computer 50, 2
(2017), 16–24.
[95] Zoom [n.d.]. Retrieved from https://zoom.us/.
Received June 2021; revised October 2021; accepted December 2021
ACM Transactions on Accessible Computing, Vol. 15, No. 2, Article 12. Publication date: May 2022.
... Many of the analysed guidance-based assistive systems use haptics in parallel to audio rendering. Most of these solutions guide the user to change direction [40,59,61,72], or announce an event such as reaching the destination [63], identifying/losing the desired object in/from the camera's field of view [13,76]. earlier systems mainly utilised vibrations to inform about obstacle direction, as seen in [40]. ...
... vibration patterns are used in tandem with audio -they signal to stop, move forward and warn of an obstacle. [76] 2022 smartphone it provides guidance on how to reach certain objects that are in the camera's field of view (e.g., "item is 2 feet away, 30 degrees left"). it also provides instructions if the object gets out of the camera's view. ...
Article
Purpose: Visually impaired people (VIP) find it challenging to understand and gain awareness of their surroundings. Most activities require the use of the auditory or tactile senses. As such, assistive systems which are capable of aiding visually impaired people to understand, navigate and form a mental representation of their environment are extensively being studied and developed. The aim of this paper is to provide insight regarding the characteristics, as well as the advantages and drawbacks of different types of sonification strategies in assistive systems, to assess their suitability for certain use-cases. Materials and methods: To this end, we reviewed a sizeable number of assistive solutions for VIP which provide a form of auditory feedback to the user, encountered in different scientific databases (Scopus, IEEE Xplore, ACM and Google Scholar) through direct searches and cross-referencing. Results: We classified these solutions based on the aural information they provide to the VIP - alerts, guidance and information about their environment, be it spatial or semantic. Our intention is not to provide an exhaustive review, but to select representative implementations from recent literature that highlight the particularities of each sonification approach. Conclusions: Thus, anyone who is intent on developing an assistive solution will be able to choose the desired sonification class, being aware of the advantages/disadvantages and at the same time having a fairly wide selection of articles from the representative class.
... Similar to work on VR solutions, AR systems have also explored approaches to support improved environmental awareness via translating visual information into text and speech (Granquist et al., 2021). Furthermore, research has presented novel interaction methods for users who are blind or experience low vision for collision avoidance , presenting magnification and visual enhancements to the virtual environment (Coughlan & Miele, 2017), and voicecontrolled interfaces (Lee et al., 2022). ...
Article
Augmented and virtual reality experiences present significant barriers for disabled people, making it challenging to fully engage with immersive platforms. Whilst researchers have started to explore potential solutions addressing these accessibility issues, we currently lack a comprehensive understanding of research areas requiring further investigation to support the development of inclusive AR/VR systems. To address current gaps in knowledge, we led a series of multidisciplinary sandpits with relevant stakeholders (i.e., academic researchers, industry specialists, people with lived experience of disability, assistive technologists, and representatives from disability organisations, charities, and special needs educational institutions) to collaboratively explore research challenges, opportunities, and solutions. Based on insights shared by participants, we present a research agenda identifying key areas where further work is required in relation to specific forms of disability (i.e., across the spectrum of physical, visual, cognitive, and hearing impairments), including wider considerations associated with the development of more accessible immersive platforms.
... For users with visual impairments, research has illustrated how immersive technologies can be used as visual aids to support environmental awareness and promote sensory substitution [46]. Furthermore, work has focused on novel interaction methods for expanding users' spatial awareness [47][48][49] and for developing novel user interaction techniques which combine object localisation and spatial audio [50,51] and echolocation [52]. Haptic interactions have also been used with immersive technology for users with visual impairments in novel interfaces to support sensory substitution [53][54][55]. ...
Preprint
Augmented and virtual reality (AR/VR) hold significant potential to transform how we communicate, collaborate, and interact with others. However, there has been a lack of work to date investigating accessibility barriers in relation to immersive technologies for people with disabilities. To address current gaps in knowledge, we led two multidisciplinary Sandpits with key stakeholders (including academic researchers, AR/VR industry specialists, people with lived experience of disability, assistive technologists, and representatives from national charities and special needs colleges) to collaboratively explore and identify existing challenges with AR and VR experiences. We present key themes that emerged from Sandpit activities and map out the interaction barriers identified across a spectrum of impairments (including physical, cognitive, visual, and auditory disabilities). We conclude with recommendations for future work addressing the challenges highlighted to support the development of more inclusive AR and VR experiences.
... For users with visual impairments, research has illustrated how immersive technologies can be used as visual aids to support environmental awareness and promote sensory substitution [46]. Furthermore, work has focused on novel interaction methods for expanding users' spatial awareness [47][48][49] and for developing novel user interaction techniques which combine object localisation and spatial audio [50,51] and echolocation [52]. Haptic interactions have also been used with immersive technology for users with visual impairments in novel interfaces to support sensory substitution [53][54][55]. ...
Article
Full-text available
Augmented and virtual reality (AR/VR) hold significant potential to transform how we communicate, collaborate, and interact with others. However, there has been a lack of work to date investigating accessibility barriers in relation to immersive technologies for people with disabilities. To address current gaps in knowledge, we led two multidisciplinary sandpits with key stakeholders (including academic researchers, AR/VR industry specialists, people with lived experience of disability, assistive technologists, and representatives from national charities and special needs colleges) to collaboratively explore and identify existing challenges with AR and VR experiences. We present key themes that emerged from sandpit activities and map out the interaction barriers identified across a spectrum of impairments (including physical, cognitive, visual, and auditory disabilities). We conclude with recommendations for future work addressing the challenges highlighted to support the development of more inclusive AR and VR experiences.
Article
For blind and visually impaired (BVI) amputees, the combined loss of vision and grasping abilities turns the seemingly simple task of reaching and grasping into a significant challenge. This paper introduces a novel multi-sensory prosthesis system designed for BVI amputees to assist in perception, navigation, and grasping tasks. The system integrates voice interaction, environmental perception, grasp guidance, collaborative control, and auditory/tactile feedback. Specifically, it processes user commands, provides environmental data via auditory/tactile channels, and manages collaborative control of grasp gestures and wrist angles for stable object handling. The prototype, viiathand, was experimentally tested with eight able-bodied and four blind subjects performing reach-and-grasp tasks, showing that users could accurately reach (average time:15.24s) and securely grasp objects (average time:17.23s) in an indoor setting. The system also proved to be user-friendly, requiring minimal training for users to become adept
Article
Full-text available
Computer vision technologies have altered Product design and development, which have replaced old manual procedures. This study thoroughly examines computer vision’s role in affecting product design, examining its historical history, applications, and ramifications. The background emphasizes the constraints of traditional design, underlining the necessity for creative alternatives. Integrating computer vision aligns with Industry 4.0 trends, which call for smart and automated design procedures. The investigation delves into the growth of computer vision, its applications in quality control, design optimization, augmented/virtual reality, user interface design, and cultural/language acquisition. The article also looks into the relationship between computer vision and consumer analysis. The content analysis reduces 285 author keywords to 31 interrelated keywords that comprise five clusters. The conclusion emphasizes the social effect, promising more accessible, efficient, and innovative design processes. This multidisciplinary study dives into computer vision in product design and development by doing a thorough analysis of a variety of datasets. Examining three datasets with 285, 1066, and 1190 terms yields important results. The findings highlight the importance of “Product Design” and “Computer Vision” with changing patterns and concentrations across datasets. Thematic studies reveal repeating focus elements in titles and abstracts, such as “Design” and “Vision,” indicating a technological emphasis, human-centric concerns, and practical consequences. Network investigations reveal complex linkages and clustering within keyword networks, allowing for more in-depth knowledge of specific areas. These findings help to understand the dynamic interaction of computer vision and product design, driving future research and innovation in this rapidly growing sector.
Conference Paper
Full-text available
Wayfinding is a critical but challenging task for people who have low vision, a visual impairment that falls short of blindness. Prior wayfinding systems for people with visual impairments focused on blind people, providing only audio and tactile feedback. Since people with low vision use their remaining vision, we sought to determine how audio feedback compares to visual feedback in a wayfinding task. We developed visual and audio wayfinding guidance on smartglasses based on de facto standard approaches for blind and sighted people and conducted a study with 16 low vision participants. We found that participants made fewer mistakes and experienced lower cognitive load with visual feedback. Moreover, participants with a full field of view completed the wayfinding tasks faster when using visual feedback. However, many participants preferred audio feedback because of its shorter learning curve. We propose design guidelines for wayfinding systems for low vision.
Preprint
Full-text available
For people with visual impairments, photography is essential in identifying objects through remote sighted help and image recognition apps. This is especially the case for teachable object recognizers, where recognition models are trained on user's photos. Here, we propose real-time feedback for communicating the location of an object of interest in the camera frame. Our audio-haptic feedback is powered by a deep learning model that estimates the object center location based on its proximity to the user's hand. To evaluate our approach, we conducted a user study in the lab, where participants with visual impairments (N = 9) used our feedback to train and test their object recognizer in vanilla and cluttered environments. We found that very few photos did not include the object (2% in the vanilla and 8% in the cluttered) and the recognition performance was promising even for participants with no prior camera experience. Participants tended to trust the feedback even though they know it can be wrong. Our cluster analysis indicates that better feedback is associated with photos that include the entire object. Our results provide insights into factors that can degrade feedback and recognition performance in teachable interfaces.
Article
One of the main tenets of most company-sponsored quality programs is that the customer is always right. Designers frequently evaluate the goodness of their systems by simply asking users whether or not they like the interface. The fallacy of this approach is that users genrally make judgements based on their “preferences” and tend to ignore the more important performance issues. System designers frequently use their own preferences to make decisions, and then make major inferences about how users will perform with their system. Several past studies are reviewed to show that users can perform well and not like a system, or like a system and still not perform well. Two recent studies are reported showing a mismatch between designer's preferences for certain interface decisions, and measured user performance when using the resulting interfaces. It is proposed that better user interfaces are possible if we clearly separate the performance and preference concepts, recognize the limitations of each, and work to optimize one or the other (there is usually not sufficient time to optimize both). The only way to ensure that systems will elicit acceptable levels of performance is to conduct performance-oriented usability tests.
Conference Paper
The introduction of augmented reality technology to iOS and Android enables, for the first time, mainstream smartphones to estimate their own motion in 3D space with high accuracy. For assistive technology researchers, this development presents a potential opportunity. In this spirit, we present our work leveraging these technologies to create a smartphone app to empower people who are visually impaired to more easily navigate indoor environments. Our app, Clew, allows users to record routes and then load them, at any time, providing automatic guidance (using haptic, speech, and sound feedback) along the route. We present our user-centered design process, Clew's system architecture and technical details, and both small and large-scale evaluations of the app. We discuss opportunities, pitfalls, and design guidelines for utilizing augmented reality for orientation and mobility apps. Our work expands the capabilities of technology for orientation and mobility that can be distributed on a mass scale.
Conference Paper
Independent mobility is one of the main challenges for blind or visually impaired (BVI) people. In particular, BVI people often need to identify and avoid nearby obstacles, for example a bicycle parked on the sidewalk. This is generally achieved with a combination of residual vision, hearing and haptic sensing using the white cane. However, in many cases, BVI people can only perceive obstacles at short distance (typically about 1m, i.e., the white cane detection range), in other situations obstacles are hard to detect (e.g., those elevated from the ground), while others should not be hit by the white cane (e.g., a standing person). Thus, some time and effort are required to identify the object in order to understand how to avoid it. A solution to these problems can be found in recent computer vision techniques that can run on mobile and wearable devices to detect obstacles at a distance. However, in addition to detecting obstacles, it is also necessary to convey information about them to a BVI user. This contribution presents WatchOut, a sonification technique for conveying real-time information about the main characteristics of an obstacle to a BVI person, who can then use this additional feedback to safely navigate in the environment. WatchOut was designed with a user-centric approach, involving two iterations of online questionnaires with BVI participants in order to define, improve and evaluate the sonification technique. WatchOut was implemented and tested as a module of a mobile app that detects obstacles using state-of-the-art computer vision technology. Results show that the system is considered usable, and can guide the users to avoid more than 85% of the obstacles.
Article
This paper describes a prototype guidance system, "FingerSight," to help people without vision locate and reach to objects in peripersonal space. It consists of four evenly spaced tactors embedded into a ring worn on the index finger, with a small camera mounted on top. Computer-vision analysis of the camera image controls vibrotactile feedback, leading users to move their hand to near targets. Two experiments tested the functionality of the prototype system. The first found that participants could discriminate between five different vibrotactile sites (four individual tactors and all simultaneously) with a mean accuracy of 88.8% after initial training. In the second experiment, participants were blindfolded and instructed to move their hand wearing the device to one of four locations within arm's reach, while hand trajectories were tracked. The tactors were controlled using two different strategies: (1) repeatedly signal axis with largest error, and (2) signal both axes in alternation. Participants demonstrated essentially straight-line trajectories toward the target under both instructions, but the temporal parameters (rate of approach, duration) showed an advantage for correction on both axes in sequence.