Chapter

Obstacle Detection Based on Generative Adversarial Networks and Fuzzy Sets for Computer-Assisted Navigation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Obstacle detection addresses the detection of an object, of any kind, that interferes with the canonical trajectory of a subject, such as a human or an autonomous robotic agent. Prompt obstacle detection can become critical for the safety of visually impaired individuals (VII). In this context, we propose a novel methodology for obstacle detection, which is based on a Generative Adversarial Network (GAN) model, trained with human eye fixations to predict saliency, and the depth information provided by an RGB-D sensor. A method based on fuzzy sets are used to translate the 3D spatial information into linguistic values easily comprehensible by VII. Fuzzy operators are applied to fuse the spatial information with the saliency information for the purpose of detecting and determining if an object may interfere with the safe navigation of the VII. For the evaluation of our method we captured outdoor video sequences of 10,170 frames in total, with obstacles including rocks, trees and pedestrians. The results showed that the use of fuzzy representations results in enhanced obstacle detection accuracy, reaching 88.1%.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Regarding the support of individuals with disabilities and especially the visually impaired individuals (VIIs), smart wearable assistive systems have been proposed [6], including object detection [7] and text recognition systems [8,9]. However, even though various wearable assistive systems have been developed for safe navigation of VIIs [6], most of them focus on obstacle detection and avoidance [10][11][12]. The majority of them have been applied mainly in indoor environments, whereas only a few of them address RP tasks [13,14]. ...
... The RP module interacts with another module of the assistive system, dedicated to obstacle detection (OD). The OD module can be based on one of the current methodologies proposed for this purpose, such as [10][11][12]. The RP module generates an optimal route in the area under examination, which can be dynamically updated based on the information on the location of possible unmapped obstacles appearing in the user's way. Figure 1 illustrates a pair of smart glasses equipped with cameras, as an example of a commonly adopted wearable assistive system for VIIs [6], to illustrate the use of the RP module. ...
Article
Full-text available
Route planning (RP) enables individuals to navigate in unfamiliar environments. Current RP methodologies generate routes that optimize criteria relevant to the traveling distance or time, whereas most of them do not consider personal preferences or needs. Also, most of the current smart wearable assistive navigation systems offer limited support to individuals with disabilities by providing obstacle avoidance instructions, but often neglecting their special requirements with respect to the route quality. Motivated by the mobility needs of such individuals, this study proposes a novel RP framework for assistive navigation that copes these open issues. The framework is based on a novel mixed 0–1 integer nonlinear programming model for solving the RP problem with constraints originating from the needs of individuals with disabilities; unlike previous models, it minimizes: (1) the collision risk with obstacles within a path by prioritizing the safer paths; (2) the walking time; (3) the number of turns by constructing smooth paths, and (4) the loss of cultural interest by penalizing multiple crossovers of the same paths, while satisfying user preferences, such as points of interest to visit and a desired tour duration. The proposed framework is applied for the development of a system module for safe navigation of visually impaired individuals (VIIs) in outdoor cultural spaces. The module is evaluated in a variety of navigation scenarios with different parameters. The results demonstrate the comparative advantage of our RP model over relevant state-of-the-art models, by generating safer and more convenient routes for the VIIs.
... The requirements under examination are achievable. The performance of the overall system was promising, as it was proved by the evaluation but also in the published studies [89,90]. However, a lot of work is needed in the design of the frame of the eyeglasses and the attached camera in order to meet the ergonomic requirements. ...
Article
Full-text available
The marginalization of people with disabilities, such as visually impaired individuals (VIIs), has driven scientists to take advantage of the fast growth of smart technologies and develop smart assistive systems (SASs) to bring VIIs back to social life, education and even to culture. Our research focuses on developing a human–computer interactive system that will guide VIIs in outdoor cultural environments by offering universal access to cultural information, social networking and safe navigation among other services. The VI users interact with computer-based SAS to control the system during its operation, while having access to remote connection with non-VIIs for external guidance and company. The development of such a system needs a user-centered design (UCD) that incorporates the elicitation of the necessary requirements for a satisfying operation for the VI users. In this paper, we present a novel SAS system for VIIs and its design considerations, which follow a UCD approach to determine a set of operational, functional, ergonomic, environmental and optional requirements of the system. Both VIIs and non-VIIs took part in a series of interviews and questionnaires, from which data were analyzed to form the requirements of the system for both the on-site and remote use. The final requirements are tested by trials and their evaluation and results are presented. The experimental investigations gave significant feedback for the development of the system, throughout the design process. The most important contribution of this study is the derivation of requirements applicable not only to the specific system under investigation, but also to other relevant SASs for VIIs.
... In recent years, with the development of sensors and mobile computing, a wide variety of portable navigation systems have been proposed to assist VIP to avoid obstacles 2019;, navigate (Jayakody et al.;Donati et al.;, and perceive the environment . Positioning plays an important role in assisting VIP. ...
Conference Paper
Obstacle detection has been a relevant issue for the implementation of autonomous robotic systems, within which increasingly robust algorithms have begun to be applied, especially Deep Learning techniques. However, these have not been widely used for the detection of obstacles in static robotic agents, contrary to what happens with mobile agents. For this reason, this work explores the use of one of these techniques, which is a neural network based on the Faster R-CNN, focused on detecting a specific obstacle (hands) in an application environment for a food assistance robot. For this purpose, a database containing 6205 training images and 1350 validation images was prepared, where 31 users perform different movements with their hands. To verify the capacity of the network, 3 architectures of different depths were implemented, which were evaluated and compared, resulting in the network of greater depth obtained the highest accuracy, of 77.4%, taking into account that the hands are not only still but also in movement, generating distortion in them and greater difficulty for their detection. Also, the internal behavior of the network was visualized through activations, to verify what it had learned, showing that it managed to focus on the hands, with some activations located in parts of the user's body such as face and arm.
Chapter
Full-text available
The Smart Glass represents potential aid for people who are visually impaired that might lead to improvements in the quality of life. The smart glass is for the people who need to navigate independently and feel socially convenient and secure while they do so. It is based on the simple idea that blind people do not want to stand out while using tools for help. This paper focuses on the significant work done in the field of wearable electronics and the features which comes as add-ons. The Smart glass consists of ultrasonic sensors to detect the object ahead in real-time and feeds the Raspberry for analysis of the object whether it is an obstacle or a person. It can also assist the person on whether the object is closing in very fast and if so, provides a warning through vibrations in the recognized direction. It has an added feature of GSM, which can assist the person to make a call during an emergency situation. The software framework management of the whole system is controlled using Robot Operating System (ROS). It is developed using ROS catkin workspace with necessary packages and nodes. The ROS was loaded on to Raspberry Pi with Ubuntu Mate.
Article
Full-text available
In this paper, we introduce the so-called DEEP-SEE framework that jointly exploits computer vision algorithms and deep convolutional neural networks (CNNs) to detect, track and recognize in real time objects encountered during navigation in the outdoor environment. A first feature concerns an object detection technique designed to localize both static and dynamic objects without any a priori knowledge about their position, type or shape. The methodological core of the proposed approach relies on a novel object tracking method based on two convolutional neural networks trained offline. The key principle consists of alternating between tracking using motion information and predicting the object location in time based on visual similarity. The validation of the tracking technique is performed on standard benchmark VOT datasets, and shows that the proposed approach returns state-of-the-art results while minimizing the computational complexity. Then, the DEEP-SEE framework is integrated into a novel assistive device, designed to improve cognition of VI people and to increase their safety when navigating in crowded urban scenes. The validation of our assistive device is performed on a video dataset with 30 elements acquired with the help of VI users. The proposed system shows high accuracy (>90%) and robustness (>90%) scores regardless on the scene dynamics.
Article
Full-text available
In this work, we give a new twist to monocular obstacle detection. Most of the existing approaches either rely on Visual SLAM systems or on depth estimation models to build 3D maps and detect obstacles. Despite their success, these methods are not specifically devised for monocular obstacle detection. In particular, they are not robust to appearance and camera intrinsics changes or texture-less scenarios. To overcome these limitations, we propose an end-to-end deep architecture that jointly learns to detect obstacle and estimate their depth. The multi task nature of this strategy strengthen both the obstacle detection task with more reliable bounding boxes and range measures and the depth estimation one with robustness to scenario changes. We call this architecture J-MOD$^{2}$ We prove the effectiveness of our approach with experiments on sequences with different appearance and focal lengths. Furthermore, we show its benefits on a set of simulated navigation experiments where a MAV explores an unknown scenario and plans safe trajectories by using our detection model.
Conference Paper
Full-text available
We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. For \(300 \times 300\) input, SSD achieves 74.3 % mAP on VOC2007 test at 59 FPS on a Nvidia Titan X and for \(512 \times 512\) input, SSD achieves 76.9 % mAP, outperforming a comparable state of the art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size. Code is available at https:// github. com/ weiliu89/ caffe/ tree/ ssd.
Article
Full-text available
The aim of the paper is focused on the design of an obstacle avoidance system for assisting visually impaired people. A disparity map will be generated thanks to the use of a stereo camera carried by the user. Working on this map will allow to develop an algorithm for the obstacle detection in any kind of scenario. The design will be completed with the introduction of audio signal to assist the blind person to avoid obstacles. To do that, we will use the frequency of the signal to codify the obstacle's angle and its intensity to codify proximity. Some experimental results are presented as well as the conclusions and the future works.
Conference Paper
Full-text available
The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called ldquoImageNetrdquo, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.
Chapter
Visual impairment restricts everyday mobility and limits the accessibility of places, which for the non-visually impaired is taken for granted. A short walk to a close destination, such as a market or a school becomes an everyday challenge. In this chapter, we present a novel solution to this problem that can evolve into an everyday visual aid for people with limited sight or total blindness. The proposed solution is a digital system, wearable like smart-glasses, equipped with cameras. An intelligent system module, incorporating efficient deep learning and uncertainty-aware decision-making algorithms, interprets the video scenes, translates them into speech, and describes them to the user through audio. The user can almost naturally interact with the system via a speech-based user interface, which is also capable of understanding the user’s emotions. The capabilities of this system are investigated in the context of accessibility and guidance to outdoor environments of cultural interest, such as the historic triangle of Athens. A survey of relevant state-of-the-art systems, technologies and services is performed, identifying critical system components that better adapt to the goals of the system, user needs and requirements, toward a user-centered architecture design.
Article
Our paper presents the development of a real-time system based on detection, classification, and position estimation of objects in an outdoor environment to provide the visually impaired individuals with a voice output-based scene perception. The system is low-cost, light weight, simple, and easily wearable. An odroid board integrated with an USB camera and USB laser is utilized for the purpose. To reduce utility problems, a user-centered design approach has been acquired in which feedback from various individuals was obtained to understand their problems and requirements. The valuable insights gained from the feedback were then used to modify the system to best suit the requirements of the user. The object detection framework exploits a multimodal feature fusion-based deep learning architecture using edge, multiscale as well as optical flow information. Fusing edge information with raw data is motivated from the fact that stronger edge regions result in a higher number of activated neurons, hence inducing better feature representations. Learning deep features from multiple scales as well as use of motion dynamics at feature level lead to better semantic and discriminative representations, thus providing robustness to the detection framework. Experimental results carried out using PASCAL VOC 2007 dataset, Caltech dataset as well as captured real-time data are demonstrated.
Article
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Conference Paper
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region pro-posal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolu-tional features. For the very deep VGG-16 model [18], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
Conference Paper
Recently, large breakthroughs have been observed in saliency modeling. The top scores on saliency benchmarks have become dominated by neural network models of saliency, and some evaluation scores have begun to saturate. Large jumps in performance relative to previous models can be found across datasets, image types, and evaluation metrics. Have saliency models begun to converge on human performance? In this paper, we re-examine the current state-of-the-art using a fine-grained analysis on image types, individual images, and image regions. Using experiments to gather annotations for high-density regions of human eye fixations on images in two established saliency datasets, MIT300 and CAT2000, we quantify up to 60% of the remaining errors of saliency models. We argue that to continue to approach human-level performance, saliency models will need to discover higher-level concepts in images: text, objects of gaze and action, locations of motion, and expected locations of people in images. Moreover, they will need to reason about the relative importance of image regions, such as focusing on the most important person in the room or the most informative sign on the road. More accurately tracking performance will require finer-grained evaluations and metrics. Pushing performance further will require higher-level image understanding.
Article
We present YOLO, a unified pipeline for object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is also extremely fast; YOLO processes images in real-time at 45 frames per second, hundreds to thousands of times faster than existing detection systems. Our system uses global image context to detect and localize objects, making it less prone to background errors than top detection systems like R-CNN. By itself, YOLO detects objects at unprecedented speeds with moderate accuracy. When combined with state-of-the-art detectors, YOLO boosts performance by 2-3% points mAP.
Article
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
Article
The traveling salesman problem (TSP) has many applications in economy, transport logic [1] etc. It also has a wide range of applicability in the mobile robot path planning optimization [2]. The paper presents research result of solving the path planning subproblem of the navigation of an intelligent autonomous mobile robotic agent. Collecting objects by a mobile robotic agent is the final problem that is intended to be solved. For the robotic mobile agent's path planning is used an unsupervised neural network that can find a closely optimal path between two points in the agent's working area. We have considered a modification of the criteria function of the winner neuron selection. Simulation results are discussed at the end of the paper. The next future development is the hardware implementation of the self-organizing map with real time functioning.
Article
In this paper, we present a robust depth-based obstacle detection system in computer vision. The system aims to assist the visually-impaired in detecting obstacles with distance information for safety. With analysis of the depth map, segmentation and noise elimination are adopted to distinguish different objects according to the related depth information. Obstacle extraction mechanism is proposed to capture obstacles by various object proprieties revealing in the depth map. The proposed system can also be applied to emerging vision-based mobile applications, such as robots, intelligent vehicle navigation, and dynamic surveillance systems. Experimental results demonstrate the proposed system achieves high accuracy. In the indoor environment, the average detection rate is above 96.1%. Even in the outdoor environment or in complete darkness, 93.7% detection rate is achieved on average.
Conference Paper
An obstacle system based on visual selective feature and stereo vision is stated in this paper. By extracting feature vectors with color, intensity and direction, the corresponding saliency map is constructed. Then, obstacle area is figured out through threshold segmentation to the saliency map, and 3-D information of the obstacles is computed by using stereo vision. Finally, obstacle information is transformed into voice which can be delivered to the blind. Indoor and outdoor experiment has proved practicality and effectiveness of this method.
Conference Paper
The early recognition of potentially harmful traffic situations is an important goal of vision based driver assistance systems. Pedestrians, in particular children, are highly endangered in inner city traffic. Within the DaimlerChrysler UTA (Urban Traffic Assistance) project, we are using stereo vision and motion analysis in order to manage those situations. The flow/depth constraint combines both methods in an elegant way and leads to a robust and powerful detection scheme.