added 21 research items
ENORASI: Intelligent Audio-visual System Enhancing Cultural Experience and Accessibility
Shadow detection is useful in a variety of image analysis applications, as it can improve scene understanding. Most of the recent shadow detection approaches use near-infrared (NIR) cameras and deep learning to provide enhanced segmentation of the shadow areas in images. In this paper a novel shadow detection method is proposed, exploiting the perceptual color representation of the HSV color space and a physics-inspired optimization algorithm for image segmentation. The comparative advantage of this method over the state-of-the-art ones is that its performance is comparable without requiring any special equipment, such as NIR cameras, while it is simpler. Quantitative and qualitative experiments on publicly available datasets in comparison with three state-of-the-art methods, validate its effectiveness.
Every day, visually challenged people (VCP) face mobility restrictions and accessibility limitations. A short walk to a nearby destination, which for other individuals is taken for granted, becomes a challenge. To tackle this problem, we propose a novel visual perception system for outdoor navigation that can be evolved into an everyday visual aid for VCP. The proposed methodology is integrated in a wearable visual perception system (VPS). The proposed approach efficiently incorporates deep learning, object recognition models, along with an obstacle detection methodology based on human eye fixation prediction using Generative Adversarial Networks. An uncertainty-aware modeling of the obstacle risk assessment and spatial localization has been employed, following a fuzzy logic approach, for robust obstacle detection. The above combination can translate the position and the type of detected obstacles into descriptive linguistic expressions, allowing the users to easily understand their location in the environment and avoid them. The performance and capabilities of the proposed method are investigated in the context of safe navigation of VCP in outdoor environments of cultural interest through obstacle recognition and detection. Additionally, a comparison between the proposed system and relevant state-of-the-art systems for the safe navigation of VCP, focused on design and user-requirements satisfaction, is performed.
Technological advances in Cloud computing and networking are offering unique opportunities for assisting everyday activities of visually impaired persons. Of particular interest is the capitalization of these technologies in the domain of aiding mobility and environment perception. In this chapter, we describe a generic system architecture design and discuss research and engineering issues toward developing a modular open Cloud platform that can be used in a customized way to match specific activities of visually impaired people. This platform facilitates remote content delivery to support system services, such as complex object/scene recognition and route planning, but also it enables the visually impaired users to connect with remotely located persons for assistance or social interaction. At the core of the proposed framework rests a sensor-based system that streams data, particularly video, for further processing in the Cloud. Although a multitude of applications are based on Cloud computing for data stream processing and data fusion, the particular requirements that must be met by assistive technologies for the visually impaired pose unique research challenges that are outlined in the chapter.
Visual impairment restricts everyday mobility and limits the accessibility of places, which for the non-visually impaired is taken for granted. A short walk to a close destination, such as a market or a school becomes an everyday challenge. In this chapter, we present a novel solution to this problem that can evolve into an everyday visual aid for people with limited sight or total blindness. The proposed solution is a digital system, wearable like smart-glasses, equipped with cameras. An intelligent system module, incorporating efficient deep learning and uncertainty-aware decision-making algorithms, interprets the video scenes, translates them into speech, and describes them to the user through audio. The user can almost naturally interact with the system via a speech-based user interface, which is also capable of understanding the user’s emotions. The capabilities of this system are investigated in the context of accessibility and guidance to outdoor environments of cultural interest, such as the historic triangle of Athens. A survey of relevant state-of-the-art systems, technologies and services is performed, identifying critical system components that better adapt to the goals of the system, user needs and requirements, toward a user-centered architecture design.
Video coding incurs high computational complexity particularly at the encoder side. For this reason, parallelism is used at the various encoding steps. One of the popular coarse grained parallelization tools offered by many standards is wavefront parallelism. Under the scheme, each row of blocks is assigned to a separate thread for processing. A thread might commence encoding a particular block once certain precedence constraints are met, namely, it is required that the left block of the same row and the top and top-right block of the previous row have finished compression. Clearly, the imposed constraints result in processing delays. Therefore, in order to optimize performance, it is of paramount importance to properly identify potential bottlenecks before the compression of a frame starts, in order to alleviate them through better resource allocation. In this paper we present a simulation model that predicts bottlenecks based on the estimated block compression times produced from a regression neural network. Experiments with datasets obtained using the reference encoder of HEVC (High Efficiency Video Coding) illustrate the merits of the proposed model.
Staircase detection in natural images has several applications in the context of robotics and visually impaired navigation. Previous works are mainly based on handcrafted feature extraction and supervised learning using fully annotated images. In this work we address the problem of staircase detection in weakly labeled natural images, using a novel Fully Convolutional neural Network (FCN), named LB-FCN light. The proposed network is an enhanced version of our recent Look-Behind FCN (LB-FCN), suitable for deployment on mobile and embedded devices. Its architecture features multi-scale feature extraction, depthwise separable convolutions and residual learning. To evaluate its computational and classification performance, we have created a weakly-labeled benchmark dataset from publicly available images. The results from the experimental evaluation of LB-FCN light indicate its advantageous performance over the relevant state-of-the-art architectures.
Obstacle detection addresses the detection of an object, of any kind, that interferes with the canonical trajectory of a subject, such as a human or an autonomous robotic agent. Prompt obstacle detection can become critical for the safety of visually impaired individuals (VII). In this context, we propose a novel methodology for obstacle detection, which is based on a Generative Adversarial Network (GAN) model, trained with human eye fixations to predict saliency, and the depth information provided by an RGB-D sensor. A method based on fuzzy sets are used to translate the 3D spatial information into linguistic values easily comprehensible by VII. Fuzzy operators are applied to fuse the spatial information with the saliency information for the purpose of detecting and determining if an object may interfere with the safe navigation of the VII. For the evaluation of our method we captured outdoor video sequences of 10,170 frames in total, with obstacles including rocks, trees and pedestrians. The results showed that the use of fuzzy representations results in enhanced obstacle detection accuracy, reaching 88.1%.