Despite the major role of Global Positioning Systems (GPS) as a navigation tool for people with visual impairments (VI), a crucial missing aspect of point-to-point navigation with these systems is the last-few-meters wayfinding problem. Due to GPS inaccuracy and inadequate map data, systems often bring a user to the vicinity of a destination but not to the exact location, causing challenges such as difficulty locating building entrances or a specific storefront from a series of stores. In this paper, we study this problem space in two parts: (1) A formative study (N=22) to understand challenges, current resolution techniques, and user needs; and (2) A design probe study (N=13) using a novel, vision-based system called Landmark AI to understand how technology can better address aspects of this problem. Based on these investigations, we articulate a design space for systems addressing this challenge, along with implications for future systems to support precise navigation for people with VI.
All content in this area was uploaded by Manaswi Saha on Aug 11, 2019
Content may be subject to copyright.
A preview of the PDF is not available
... The advent of smartphone-based applications like Google Maps [1], BlindSquare [7], and others has significantly enhanced outdoor navigation using Global Positioning System (GPS) and mapping services such as the Google Maps Platform [2] and OpenStreet Map [3]. However, the accuracy of GPS can falter by up to ±5 meters [17], which poses challenges, especially in "last-few-meters" navigation [32]. Indoor environments exacerbate these challenges due to poor GPS reception and the absence of detailed indoor mapping [24,31]. ...
... In recent advancements, the application of CV technologies has emerged as a cost-effective approach for enhancing indoor navigation [10,42]. Using smartphones, CV-based systems can interpret visual cues like object recognition [46], color codes, and significant landmarks or signage [15,32]. These systems can also process various tags like barcodes, RFID, or vanishing points for better navigation support [12,25,35]. ...
... These systems can also process various tags like barcodes, RFID, or vanishing points for better navigation support [12,25,35]. However, the reliability of solely using CV for precise navigation for PVI remains insufficient [32]. Our prototype integrates CV, particularly utilizing LiDAR, with a LLM to jointly provide assistance for PVI during their travels. ...
Assistive technologies for people with visual impairments (PVI) have made significant advancements, particularly with the integration of artificial intelligence (AI) and real-time sensor technologies. However, current solutions often require PVI to switch between multiple apps and tools for tasks like image recognition, navigation, and obstacle detection, which can hinder a seamless and efficient user experience. In this paper, we present NaviGPT, a high-fidelity prototype that integrates LiDAR-based obstacle detection, vibration feedback, and large language model (LLM) responses to provide a comprehensive and real-time navigation aid for PVI. Unlike existing applications such as Be My AI and Seeing AI, NaviGPT combines image recognition and contextual navigation guidance into a single system, offering continuous feedback on the user's surroundings without the need for app-switching. Meanwhile, NaviGPT compensates for the response delays of LLM by using location and sensor data, aiming to provide practical and efficient navigation support for PVI in dynamic environments.
... The results showed that smart technologies greatly assisted the blind during indoor and outdoor navigation. Several researchers used mixed methods to assess the effectiveness of smart technologies for blind movement [16][17][18][19]. ...
... Despite the large body of work on accessible shopping for blind people, supporting recreational window shopping remains unexplored, with a few exceptions to test wayfinding and navigation systems in shopping malls (e.g., [98,101]). Yet, these efforts are intended to support blind people to reach their destination effectively. While this is still an important research direction, Guerreiro et al. [36] investigating airport navigation for blind people found a significant limitation in their travel experiences even when supposedly assisted -they are escorted to the target gate but have no opportunity to explore nearby shops or restaurants, reflecting their desire to "get up and move around." ...
... Similar concerns arise in systems where error risks are less critical than those in self-driving vehicles but still consequential. For instance, studies have shown that minor errors in navigation systems can lead to frustration and disorientation, even when the destination is just a few meters away (e.g., [74]). Prior studies also highlighted the need for blind users to distinguish and handle errors when understanding images with AI-based image descriptions [30,51]. ...
Object recognition technologies hold the potential to support blind and low-vision people in navigating the world around them. However, the gap between benchmark performances and practical usability remains a significant challenge. This paper presents a study aimed at understanding blind users' interaction with object recognition systems for identifying and avoiding errors. Leveraging a pre-existing object recognition system, URCam, fine-tuned for our experiment, we conducted a user study involving 12 blind and low-vision participants. Through in-depth interviews and hands-on error identification tasks, we gained insights into users' experiences, challenges, and strategies for identifying errors in camera-based assistive technologies and object recognition systems. During interviews, many participants preferred independent error review, while expressing apprehension toward misrecognitions. In the error identification task, participants varied viewpoints, backgrounds, and object sizes in their images to avoid and overcome errors. Even after repeating the task, participants identified only half of the errors, and the proportion of errors identified did not significantly differ from their first attempts. Based on these insights, we offer implications for designing accessible interfaces tailored to the needs of blind and low-vision users in identifying object recognition errors.
We present Snap&Nav, a navigation system for blind people in unfamiliar buildings, without prebuilt digital maps. Instead, the system utilizes the floor map as its primary information source for route guidance. The system requires a sighted assistant to capture an image of the floor map, which is analyzed to create a node map containing intersections, destinations, and current positions on the floor. The system provides turn-by-turn navigation instructions while tracking users' positions on the node map by detecting intersections. Additionally, the system estimates the scale difference of the node map to provide distance information. Our system was validated through two user studies with 20 sighted and 12 blind participants. Results showed that sighted participants processed floor map images without being accustomed to the system, while blind participants navigated with increased confidence and lower cognitive load compared to the condition using only cane, appreciating the system's potential for use in various buildings.
Remote sighted assistance (RSA) has emerged as a conversational technology aiding people with visual impairments (VI) through real-time video chat communication with sighted agents. We conducted a literature review and interviewed 12 RSA users to understand the technical and navigational challenges faced by both agents and users. The technical challenges were categorized into four groups: agents’ difficulties in orienting and localizing users, acquiring and interpreting users’ surroundings and obstacles, delivering information specific to user situations, and coping with poor network connections. We also presented 15 real-world navigational challenges, including 8 outdoor and 7 indoor scenarios. Given the spatial and visual nature of these challenges, we identified relevant computer vision problems that could potentially provide solutions. We then formulated 10 emerging problems that neither human agents nor computer vision can fully address alone. For each emerging problem, we discussed solutions grounded in human–AI collaboration. Additionally, with the advent of large language models (LLMs), we outlined how RSA can integrate with LLMs within a human–AI collaborative framework, envisioning the future of visual prosthetics.
Navigation assistive technologies have been designed to support individuals with visual impairments during independent mobility by providing sensory augmentation and contextual awareness of their surroundings. Such information is habitually provided through predefned audio-haptic interaction paradigms. However, individual capabilities, preferences and behavior of people with visual impairments are heterogeneous, and may change due to experience, context and necessity. Therefore, the circumstances and modalities for providing navigation assistance need to be personalized to different users, and through time for each user.
We conduct a study with 13 blind participants to explore how the desirability of messages provided during assisted navigation varies based on users' navigation preferences and expertise. The participants are guided through two different routes, one without prior knowledge and one previously studied and traversed. The guidance is provided through turn-by-turn instructions, enriched with contextual information about the environment. During navigation and follow-up interviews, we uncover that participants have diversifed needs for navigation instructions based on their abilities and preferences. Our study motivates the design of future navigation systems capable of verbosity level personalization in order to keep the users engaged in the current situational context while minimizing distractions.
Recent techniques for indoor localization are now able to support practical, accurate turn-by-turn navigation for people with visual impairments (PVI). Understanding user behavior as it relates to situational contexts can be used to improve the ability of the interface to adapt to problematic scenarios, and consequently reduce navigation errors. This work performs a fine-grained analysis of user behavior during indoor assisted navigation, outlining different scenarios where user behavior (either with a white-cane or a guide-dog) is likely to cause navigation errors. The scenarios include certain instructions (e.g., slight turns, approaching turns), cases of error recovery, and the surrounding environment (e.g., open spaces and landmarks). We discuss the findings and lessons learned from a real-world user study to guide future directions for the development of assistive navigation interfaces that consider the users' behavior and coping mechanisms.
Assistive applications for orientation and mobility promote independence for people with visual impairment (PVI). While typical design and evaluation of such applications involves small-sample iterative studies, we analyze large-scale longitudinal data from a geographically diverse population. Our publicly released dataset from iMove, a mobile app supporting orientation of PVI, contains millions of interactions by thousands of users over a year.
Our analysis: (i) examines common functionalities, settings, assistive features, and movement modalities in iMove dataset, and (ii) discovers user communities based on interaction patterns. We find that the most popular interaction mode is passive, where users receive more notifications, often verbose, while in motion and perform fewer actions. The use of built-in assistive features such as enlarged text indicate a high presence of users with residual sight. Users fall into three distinct groups: C1) users interested in surrounding points of interest, C2) users interacting in short bursts to inquire about current location, and C3) users with long active sessions while in motion. iMove was designed with C3 in mind and one strength of our contribution is providing meaningful semantics for unanticipated groups, C1 and C2. Our analysis reveals insights that can be generalized to other assistive orientation and mobility applications.
A significant challenge faced by visually impaired people is ‘way finding’, which is the ability to find one’s way to a destination in an unfamiliar environment. This study develops a novel wayfinding system for smartphones that can automatically recognize the situation and scene objects in real time. Through analyzing streaming images, the proposed system first classifies the current situation of a user in terms of their location. Next, based on the current situation, only the necessary context objects are found and interpreted using computer vision techniques. It estimates the motions of the user with two inertial sensors and records the trajectories of the user toward the destination, which are also used as a guide for the return route after reaching the destination. To efficiently convey the recognized results using an auditory interface, activity-based instructions are generated that guide the user in a series of movements along a route. To assess the effectiveness of the proposed system, experiments were conducted in several indoor environments: the sit in which the situation awareness accuracy was 90% and the object detection false alarm rate was 0.016. In addition, our field test results demonstrate that users can locate their paths with an accuracy of 97%.
Research Methods in Human-Computer Interaction is a comprehensive guide to performing research and is essential reading for both quantitative and qualitative methods. Since the first edition was published in 2009, the book has been adopted for use at leading universities around the world, including Harvard University, Carnegie-Mellon University, the University of Washington, the University of Toronto, HiOA (Norway), KTH (Sweden), Tel Aviv University (Israel), and many others. Chapters cover a broad range of topics relevant to the collection and analysis of HCI data, going beyond experimental design and surveys, to cover ethnography, diaries, physiological measurements, case studies, crowdsourcing, and other essential elements in the well-informed HCI researcher's toolkit. Continual technological evolution has led to an explosion of new techniques and a need for this updated 2nd edition, to reflect the most recent research in the field and newer trends in research methodology. This Research Methods in HCI revision contains updates throughout, including more detail on statistical tests, coding qualitative data, and data collection via mobile devices and sensors. Other new material covers performing research with children, older adults, and people with cognitive impairments.
The majority of information in the physical environment is conveyed visually, meaning that people with vision impairments often lack access to the shared cultural, historical, and practical features that define a city. How can someone who is blind find out about the sleek skyscrapers that dot a modern city's skyline, historic cannons that have been remade into traffic pillars, or ancient trees that uproot a neighborhood's sidewalks? We present FootNotes, a system that embeds rich textual descriptions of objects and locations in OpenStreetMap, a popular geowiki. Both sighted and blind users can annotate the physical environment with functional, visual, historical, and social descriptions. We report on the experience of ten participants with vision impairments who used a spatialized audio application to interact with these annotations while exploring a city. By sharing rich annotations of physical objects and areas, FootNotes helps people thoroughly explore a new location or serendipitously discover previously unknown features of familiar environments.
Humans rely on properties of the materials that make up objects to guide our interactions with them. Grasping smooth materials, for example, requires care, and softness is an ideal property for fabric used in bedding. Even when these properties are not visual (e.g. softness is a physical property), we may still infer their presence visually. We refer to such material properties as visual material attributes. Recognizing these attributes in images can contribute valuable information for general scene understanding and material recognition. Unlike well-known object and scene attributes, visual material attributes are local properties with no fixed shape or spatial extent. We show that given a set of images annotated with known material attributes, we may accurately recognize the attributes from small local image patches. Obtaining such annotations in a consistent fashion at scale, however, is challenging. To address this, we introduce a method that allows us to probe the human visual perception of materials by asking simple yes/no questions comparing pairs of image patches. This provides sufficient weak supervision to build a set of attributes and associated classifiers that, while unnamed, serve the same function as the named attributes we use to describe materials. Doing so allows us to recognize visual material attributes without resorting to exhaustive manual annotation of a fixed set of named attributes. Furthermore, we show that this method may be integrated in the end-to-end learning of a material classification CNN to simultaneously recognize materials and discover their visual attributes. Our experimental results show that visual material attributes, whether named or automatically discovered, provide a useful intermediate representation for known material categories themselves as well as a basis for transfer learning when recognizing previously-unseen categories.
Navigating in unfamiliar environments is challenging for most people, especially for individuals with visual impairments. While many personal navigation tools have been proposed to enable in- dependent indoor navigation, they have insufficient accuracy (e.g., 5-10 m), do not provide semantic features about surroundings (e.g., doorways, shops, etc.), and may require specialized devices to function. Moreover, the deployment of many systems is often only evaluated in constrained scenarios, which may not precisely reflect the performance in the real world. Therefore, we have de- signed and implemented NavCog3, a smartphone-based indoor navigation assistant that has been evaluated in a 21,000 m2 shop- ping mall. In addition to turn-by-turn instructions, it provides in- formation on landmarks (e.g., tactile paving) and points of interests nearby. We first conducted a controlled study with 10 visually im- paired users to assess localization accuracy and the perceived use- fulness of semantic features. To understand the usability of the app in a real-world setting, we then conducted another study with 43 participants with visual impairments where they could freely nav- igate in the shopping mall using NavCog3. Our findings suggest that NavCog3 can open a new opportunity for users with visual im- pairments to independently find and visit large and complex places with confidence.