Article

Luminophonics experiment: A user study on visual sensory substitution device

Abstract and Figures

Loss of vision is a severe impairment to the dominant sensory system. It often has a catastrophic effect upon the sufferer, with knock-on effects to their standard of living, their ability to support themselves, and their care-givers lives. Research into visual impairments is multi-faceted, focusing on the causes of these debilitating conditions as well as attempting to alleviate the daily lives of affected individuals. One of the methods is through the usage of sensory substitution device. Our proposed system, Luminophonics, focuses on visual to auditory cross modalities information conversions. A visual to audio sensory substitution device a type of system that obtains a continual stream visual inputs which it converts into corresponding auditory soundscape. Ultimately, this device allows the visually impaired to visualize the surrounding environment by only listening to the generated soundscape. Even though there is a huge potential for this kind of devices, public usage is still minimal (Loomis, 2010). In order to promote the adoption from the visually impaired, the overall performance of these devices need to be improved in terms of soundscape interpretability, information preservation and listening comfort amongst other factors. Luminophonics has developed 3 type of prototypes, which we have used to explore different ideas pertaining to visual to audio sensory substitution. In addition to these, one of the prototypes has been converted to include depth information using time of flight camera. Previously, an automated measurement method is used to evaluate the performance of the 3 prototypes (Tan, 2013). The results of the measurement cover the effectiveness in terms of interpretability and information preservation. The main purpose of the experiment reported herein, was to test the prototypes on human subjects in order to gain greater insight on how they perform in real-life situations.
Content may be subject to copyright.
A preview of the PDF is not available
... Trotz der langen Forschungstradition und trtz zahlreicher Veröffentlichungen mit vielversprechenden Ergebnissen gelang dem Konzept bis heute kein echter Durchbruch. Die genauen Gründe für die niedrige Zahl verfügbarer Devices und User * innen, wurden bereits häufig diskutiert und zum Gegenstand von Verbesserungsvorschlägen gemacht (Bach-y-Rita & W Kercel 2003: 541;Beckmann 2013;Lenay et al. 2003: 10 f.;Meijer 1992: 112 Die Idee Tiefenbilder bei SSDs einzusetzen wurde bereits mehrfach vorgeschlagen, die Umsetzungen hingegen sind rar und keiner der Vorschläge konnte es bisher zur Marktreife bringen (Bujacz 2010: 18;Filipe et al. 2012;Gholamalizadeh et al. 2017;Gomez et al. 2014: 20;Hamilton-Fletcher et al. 2016;Morar et al. 2017: 694;Sánchez 2015;Stoll et al. 2015;Tan et al. 2015). Shrewsbury (2011) schlug zudem ein Konzept vor, der dem in dieser Arbeit vorgeschlagenem Prototypen funktional ähnelt. ...
Thesis
Full-text available
As this is my bachelor thesis the paper is only available in German - sorry for that. This project deals with the phenomenon of sensory substitution by which the function of one missing or faulty sensory modality is replaced (substituted) by stimulating another one. During the thesis a device has been developed, which aims to enable the blind to haptically experience the surroundings and spatial depth through vibration, so that they can detect obstacles and orient themselves within space in order to better cope with their daily activities. more information (German and English) on: https://unfoldingspace.jakobkilian.de
Article
Full-text available
Visual to auditory conversion systems have been in existence for several decades. Besides being among the front runners in providing visual capabilities to blind users, the auditory cues generated from image sonification systems are still easier to learn and adapt to compared to other similar techniques. Other advantages include low cost, easy customizability, and universality. However, every system developed so far has its own set of strengths and weaknesses. In order to improve these systems further, we propose an automated and quantitative method to measure the performance of such systems. With these quantitative measurements, it is possible to gauge the relative strengths and weaknesses of different systems and rank the systems accordingly. Performance is measured by both the interpretability and also the information preservation of visual to auditory conversions. Interpretability is measured by computing the correlation of inter image distance (IID) and inter sound distance (ISD) whereas the information preservation is computed by applying Information Theory to measure the entropy of both visual and corresponding auditory signals. These measurements provide a basis and some insights on how the systems work. With an automated interpretability measure as a standard, more image sonification systems can be developed, compared, and then improved. Even though the measure does not test systems as thoroughly as carefully designed psychological experiments, a quantitative measurement like the one proposed here can compare systems to a certain degree without incurring much cost. Underlying this research is the hope that a major breakthrough in image sonification systems will allow blind users to cost effectively regain enough visual functions to allow them to lead secure and productive lives.
Article
Full-text available
This book focuses on Active Vision: the psychology of looking and seeing. The authors present a view of the process of seeing, with a particular emphasis on visual attention. They contend that the regular sampling of the environment with eye movements is the normal process of visual attention. Several sections of the book are devoted to the neurophysiological substrates underpinning the processes of active vision. Topics of interest that are included are: visual orienting, visual selection, covert attention, eye movements, neural scenes and activities, human neuropsychology, and space constancy and trans-saccadic integration. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
The goal of the See ColOr project is to achieve a noninvasive mobility aid for blind users that will use the auditory pathway to represent in real-time frontal image scenes. We present and discuss here two image processing methods that were experimented in this work: image simplification by means of segmentation, and guiding the focus of attention through the computation of visual saliency. A mean shift segmentation technique gave the best results, but for real-time constraints we simply implemented an image quantification method based on the HSL colour system. More particularly, we have developed two prototypes which transform HSL coloured pixels into spatialised classical instrument sounds lasting for 300 ms. Hue is sonified by the timbre of a musical instrument, saturation is one of four possible notes, and luminosity is represented by bass when luminosity is rather dark and singing voice when it is relatively bright. The first prototype is devoted to static images on the computer screen, while the second has been built up on a stereoscopic camera which estimates depth by triangulation. In the audio encoding, distance to objects was quantified into four duration levels. Six participants with their eyes covered by a dark tissue were trained to associate colours with musical instruments and then asked to determine on several pictures, objects with specific shapes and colours. In order to simplify the protocol of experiments, we used a tactile tablet, which took the place of the camera. Overall, colour was helpful for the interpretation of image scenes. Moreover, preliminary results with the second prototype consisting in the recognition of coloured balloons were very encouraging. Image processing techniques such as saliency could accelerate in the future the interpretation of sonified image scenes.
Article
Full-text available
Spatial-temporal gaze behaviour patterns were analysed as normal participants wearing a mobile eye tracker were required to step on 17 footprints, regularly or irregularly spaced over a 10-m distance, placed in their travel path. We examined the characteristics of two types of gaze fixation with respect to the participants' stepping patterns: footprint fixation; and travel fixation when the gaze is stable and travelling at the speed of whole body. The results showed that travel gaze fixation is a dominant gaze behaviour occupying over 50% of the travel time. It is hypothesised that this gaze behaviour would facilitate acquisition of environmental and self-motion information from the optic flow that is generated during locomotion: this in turn would guide movements of the lower limbs to the appropriate landing targets. When participants did fixate on the landing target they did so on average two steps ahead, about 800-1000 ms before the limb is placed on the target area. This would allow them sufficient time to successfully modify their gait patterns. None of the gaze behaviours was influenced by the placement (regularly versus irregularly spaced) of the footprints or repeated exposures to the travel path. Rather visual information acquired during each trial was used "de novo" to modulate gait patterns. This study provides a clear temporal link between gaze and stepping pattern and adds to our understanding of how vision is used to regulate locomotion.
Article
Full-text available
During performance of natural tasks subjects sometimes fixate objects that are manipulated several seconds later. Such early looks are known as "look-ahead fixations" (Pelz and Canosa in Vision Res 41(25-26):3587-3596, 2001). To date, little is known about their function. To investigate the possible role of these fixations, we measured fixation patterns in a model-building task. Subjects assembled models in two sequences where reaching and grasping were interrupted in one sequence by an additional action. Results show look-ahead fixations prior to 20% of the reaching and grasping movements, occurring on average 3 s before the reach. Their frequency was influenced by task sequence, suggesting that they are purposeful and have a role in task planning. To see if look-aheads influenced the subsequent eye movement during the reach, we measured eye-hand latencies and found they increased by 122 ms following a look-ahead to the target. The initial saccades to the target that accompanied a reach were also more accurate following a look-ahead. These results demonstrate that look-aheads influence subsequent visuo-motor coordination, and imply that visual information on the temporal and spatial structure of the scene was retained across intervening fixations and influenced subsequent movement programming. Additionally, head movements that accompanied look-aheads were significantly smaller in amplitude (by 10 degrees) than those that accompanied reaches to the same locations, supporting previous evidence that head movements play a role in the control of hand movements. This study provides evidence of the anticipatory use of gaze in acquiring information about objects for future manipulation.
Article
Full-text available
An experimental system for the conversion of images into sound patterns was designed to provide auditory image representations within some of the known limitations of the human hearing systems possibly as a step towards the development of a vision substitution device for the blind. The application of an invertible (one-to-one) image-to-sound mapping ensures the preservation of visual information. The system implementation involves a pipelined special-purpose computer connected to a standard television camera. A novel design and the use of standard components have made for a low-cost portable prototype conversion system with a power dissipation suitable for battery operation. Computerized sampling of the system output and subsequent calculation of the approximate inverse (sound-to-image) mapping provided the first convincing experimental evidence for the preservation of visual information in sound representations of complicated images.
Article
Full-text available
The concept of a real-time range camera without moving parts is described, based on the time-of-flight (TOF) principle. It operates with modulated visible and near-infrared radiation, which is detected and demodulated simultaneously by a 2-D array of lock-in pixels employing the charge-coupled device principle. Each pixel individually measures the amplitude, offset and phase of the received radiation. The theoretical resolution limit of this TOF range camera is derived, which depends on the square root of the detected background radiation and the inverse of the modulation amplitude. Actual measurements of 3-D sequences acquired at 10 range images per second show excellent agreement between our theory and the observed results. A range resolution of a few centimeters over a range of 10 m, with an illumination power of a few hundreds of milliwatts is obtained in laboratory scenes for noncooperative, diffusely reflecting objects
Article
Sensory substitution devices (SSDs) have come a long way since first developed for visual rehabilitation. They have produced exciting experimental results, and have furthered our understanding of the human brain. Unfortunately, they are still not used for practical visual rehabilitation, and are currently considered as reserved primarily for experiments in controlled settings. Over the past decade, our understanding of the neural mechanisms behind visual restoration has changed as a result of converging evidence, much of which was gathered with SSDs. This evidence suggests that the brain is more than a pure sensory-machine but rather is a highly flexible task-machine, i.e., brain regions can maintain or regain their function in vision even with input from other senses. This complements a recent set of more promising behavioral achievements using SSDs and new promising technologies and tools. All these changes strongly suggest that the time has come to revive the focus on practical visual rehabilitation with SSDs and we chart several key steps in this direction such as training protocols and self-train tools.
Conference Paper
Luminophonics is a system that aims to maximize cross-modality conversion of information, specifically from the visual to auditory modalities, with the motivation to develop a better assistive technology for the visually impaired by using image sonification techniques. The project aims to research and develop generic and highly-configurable components concerned with different image processing techniques, attention mechanisms, orchestration approaches and psychological constraints. The swiping method that is introduced in this paper combines several techniques in order to explicitly convert the colour, size and position of objects. Preliminary tests suggest that the approach is valid and deserves further investigation.
Article
Finding one's way around an environment and remembering the events that occur within it are crucial cognitive abilities that have been linked to the hippocampus and medial temporal lobes. Our review of neuropsychological, behavioral, and neuroimaging studies of human hippocampal involvement in spatial memory concentrates on three important concepts in this field: spatial frameworks, dimensionality, and orientation and self-motion. We also compare variation in hippocampal structure and function across and within species. We discuss how its spatial role relates to its accepted role in episodic memory. Five related studies use virtual reality to examine these two types of memory in ecologically valid situations. While processing of spatial scenes involves the parahippocampus, the right hippocampus appears particularly involved in memory for locations within an environment, with the left hippocampus more involved in context-dependent episodic or autobiographical memory.