Chapter

Cloud Video Guidance as “Deus ex Machina” for the Visually Impaired

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Technological advances in Cloud computing and networking are offering unique opportunities for assisting everyday activities of visually impaired persons. Of particular interest is the capitalization of these technologies in the domain of aiding mobility and environment perception. In this chapter, we describe a generic system architecture design and discuss research and engineering issues toward developing a modular open Cloud platform that can be used in a customized way to match specific activities of visually impaired people. This platform facilitates remote content delivery to support system services, such as complex object/scene recognition and route planning, but also it enables the visually impaired users to connect with remotely located persons for assistance or social interaction. At the core of the proposed framework rests a sensor-based system that streams data, particularly video, for further processing in the Cloud. Although a multitude of applications are based on Cloud computing for data stream processing and data fusion, the particular requirements that must be met by assistive technologies for the visually impaired pose unique research challenges that are outlined in the chapter.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The system can provide directions for assisted navigation of VII as well as for their guided touring. Furthermore, the system has a social network aspect that enables the sharing of experiences of the guided users with other remotely located people [50]. This functionality is based on a stereoscopic video streaming service and could provide a unique experience of a virtual visit in various sites through 3D virtual reality (VR) technologies. ...
Article
Full-text available
The marginalization of people with disabilities, such as visually impaired individuals (VIIs), has driven scientists to take advantage of the fast growth of smart technologies and develop smart assistive systems (SASs) to bring VIIs back to social life, education and even to culture. Our research focuses on developing a human–computer interactive system that will guide VIIs in outdoor cultural environments by offering universal access to cultural information, social networking and safe navigation among other services. The VI users interact with computer-based SAS to control the system during its operation, while having access to remote connection with non-VIIs for external guidance and company. The development of such a system needs a user-centered design (UCD) that incorporates the elicitation of the necessary requirements for a satisfying operation for the VI users. In this paper, we present a novel SAS system for VIIs and its design considerations, which follow a UCD approach to determine a set of operational, functional, ergonomic, environmental and optional requirements of the system. Both VIIs and non-VIIs took part in a series of interviews and questionnaires, from which data were analyzed to form the requirements of the system for both the on-site and remote use. The final requirements are tested by trials and their evaluation and results are presented. The experimental investigations gave significant feedback for the development of the system, throughout the design process. The most important contribution of this study is the derivation of requirements applicable not only to the specific system under investigation, but also to other relevant SASs for VIIs.
Chapter
Full-text available
Nowadays, technology is developing rapidly. The middle class has experienced technology such as smartphones, but unfortunately, there are still many applications that are not friendly for people with disabilities. The closest example is in Indonesia, where people with disabilities still receive less attention; all development support facilities do not pay attention to the comfort aspect for disabled users. This paper presents five features to help users, especially those who have blindness. These features are chat to speak, chat using voice, motion detect for emergency needs, detect object, voice to search engine, and weather information. In addition, a use case diagram was used to describe the application process and a class diagram to describe the database design.KeywordsMobile applicationApplication for disabilitiesVision assistant appsBlind users applicationInformation systems
Chapter
Monitoring healthcare is a major issue, that requires attention. In underdeveloped countries, the number of nurses for patients is relatively low, and the accessibility of 24-h medical supervision is also ambiguous, resulting in the incidence of easily avoidable deaths as well as urgent situations, causing a disturbance in the health sector. The medical dispensers that are currently available are expensive, and devices that combine a reminder and a dispenser are hard to come by. The major goal of the medicine reminder is to automatically transmit an alarm and dispense medicine to the correct individual at the stated time from a single machine. An automatic medicine distributor is developed for persons who take medicine without expert direction. It is used by a single patient or by a group of patients. It discharges the individual of the error-prone job of injecting the incorrect medicine at the incorrect time. The main goal line is to retain the device simple usage and affordable. Working software is trustworthy and steady. The older age population will benefit greatly from the device because it can substitute expensive medicinal treatment and the money spent on a personal nurse.KeywordsInternet of ThingsMedical update frameworksGSM moduleArduino
Article
Full-text available
Since the launch of Google Glass in 2014, smart glasses have mainly been designed to support micro-interactions. The ultimate goal for them to become an augmented reality interface has not yet been attained due to an encumbrance of controls. Augmented reality involves superimposing interactive computer graphics images onto physical objects in the real world. This survey reviews current research issues in the area of human-computer interaction for smart glasses. The survey first studies the smart glasses available in the market and afterwards investigates the interaction methods proposed in the wide body of literature. The interaction methods can be classified into hand-held, touch, and touchless input. This paper mainly focuses on the touch and touchless input. Touch input can be further divided into on-device and on-body, while touchless input can be classified into hands-free and freehand. Next, we summarize the existing research efforts and trends, in which touch and touchless input are evaluated by a total of eight interaction goals. Finally, we discuss several key design challenges and the possibility of multi-modal input for smart glasses.
Article
Full-text available
The fourth author name was missed in the original publication. The correct list of authors should read as “Hugo Fernandes, Paulo Costa, Vitor Filipe, Hugo Paredes, João Barroso”. It has been corrected in this erratum. The original article has been updated.
Article
Full-text available
Under several emerging application scenarios, such as in smart cities, operational monitoring of large infrastructure, wearable assistance, and Internet of Things, continuous data streams must be processed under very short delays. Several solutions, including multiple software engines, have been developed for processing unbounded data streams in a scalable and efficient manner. More recently, architecture has been proposed to use edge computing for data stream processing. This paper surveys state of the art on stream processing engines and mechanisms for exploiting resource elasticity features of cloud computing in stream processing. Resource elasticity allows for an application or service to scale out/in according to fluctuating demands. Although such features have been extensively investigated for enterprise applications, stream processing poses challenges on achieving elastic systems that can make efficient resource management decisions based on current load. Elasticity becomes even more challenging in highly distributed environments comprising edge and cloud computing resources. This work examines some of these challenges and discusses solutions proposed in the literature to address them.
Article
Full-text available
The overall objective of this work is to review the assistive technologies that have been proposed by researchers in recent years to address the limitations in user mobility posed by visual impairment. This work presents an “umbrella review.” Visually impaired people often want more than just information about their location and often need to relate their current location to the features existing in the surrounding environment. Extensive research has been dedicated into building assistive systems. Assistive systems for human navigation, in general, aim to allow their users to safely and efficiently navigate in unfamiliar environments by dynamically planning the path based on the user’s location, respecting the constraints posed by their special needs. Modern mobile assistive technologies are becoming more discrete and include a wide range of mobile computerized devices, including ubiquitous technologies such as mobile phones. Technology can be used to determine the user’s location, his relation to the surroundings (context), generate navigation instructions and deliver all this information to the blind user.
Article
Full-text available
Blind or visually impaired (BVI) individuals are capable of identifying an object in their hands by combining the available visual cues (if available) with manipulation. It is harder for them to associate the object with a specific brand, a model, or a type. Starting from this observation, we propose a collaborative system designed to deliver visual feedback automatically and to help the user filling this semantic gap. Our visual recognition module is implemented by means of an image retrieval procedure that provides real-time feedback, performs the computation locally on the device, and is scalable to new categories and instances. We carry out a thorough experimental analysis of the visual recognition module, which includes a comparative analysis with the state of the art. We also present two different system implementations that we test with the help of BVI users to evaluate the technical soundness, the usability, and the effectiveness of the proposed concept.
Conference Paper
Full-text available
When preparing to visit new locations, sighted people of­ten look at maps to build an a priori mental representation of the environment as a sequence of step-by-step ac­tions and points of interest (POIs), e.g., turn right after the coffee shop. Based on this observation, we would like to understand if building the same type of sequential rep­resentation, prior to navigating in a new location, is help­ful for people with visual impairments (VI). In particular, our goal is to understand how the simultaneous interplay between turn-by-turn navigation instructions and the rel­evant POIs in the route can aid the creation of a memo­rable sequential representation of the world. To this end, we present two smartphone-based virtual navigation inter­ faces: VirtualLeap, which allows the user to jump through a sequence of street intersection labels, turn-by-turn instruc­tions and POIs along the route; and VirtualWalk, which simulates variable speed step-by-step walking using audio effects, whilst conveying similar route information. In a user study with 14 VI participants, most were able to create and maintain an accurate mental representation of both the se­quential structure of the route and the approximate loca­tions of the POIs. While both virtual navigation modalities resulted in similar spatial understanding, results suggests that each method is useful in different interaction contexts.
Article
Full-text available
In recent years, we have witnessed an explosion in the growth of social media networks, powered by the proliferation of handheld smart devices with high processing capabilities and a plethora of sensors including high-resolution cameras. A key component of information exchange in such networks, accounting for the majority of network traffic, is video. Currently, the de facto video coding standard in use is H.264/AVC which was sufficient in addressing the challenges posed by HD more than a decade ago, but is less than efficient in the new era of 4K smart device cameras and 8K TV screens. Given that newer standards exist and are capable of achieving higher compression rates at the same quality compared to H.264/AVC, we envision that within the next few years, the related industry will shift toward one of the newer video coding standards. For a social media network, such a transition poses manifold challenges, one of them being the need to transcode previous content in the newly adopted standard. In this paper, we illustrate a framework for performing such transition in a smooth manner. The framework, algorithms and strategies developed are applicable, perhaps with minor changes, regardless of the targeted standard for adoption. We detail on framework components through simulation experiments, using as a yardstick the adoption of high efficiency video coding. Results demonstrate that depending on the targeted social platform, different strategies should be applied, while the cost and benefits of the paradigm shift may vary significantly.
Conference Paper
Full-text available
The era of big data has led to the emergence of new systems for real-time distributed stream processing, e.g., Apache Storm is one of the most popular stream processing systems in industry today. However, Storm, like many other stream processing systems lacks an intelligent scheduling mechanism. The default round-robin scheduling currently deployed in Storm disregards resource demands and availability, and can therefore be inecient at times. We present R-Storm (Resource-Aware Storm), a system that implements resource-aware scheduling within Storm. R-Storm is designed to increase overall throughput by maximizing resource utilization while minimizing network latency. When scheduling tasks, R-Storm can satisfy both soft and hard resource constraints as well as minimizing network distance between components that communicate with each other. We evaluate R-Storm on set of micro-benchmark Storm applications as well as Storm applications used in production at Yahoo! Inc. From our experimental results we conclude that R-Storm achieves 30-47% higher throughput and 69-350% better CPU utilization than default Storm for the micro-benchmarks. For the Yahoo! Storm applications, R-Storm outperforms default Storm by around 50% based on overall throughput. We also demonstrate that R-Storm performs much better when scheduling multiple Storm applications than default Storm.
Conference Paper
Full-text available
Energy efficiency of servers has become a significant research topic over the last years, as server energy consumption varies depending on multiple factors, such as server utilization and workload type. Server energy analysis and estimation must take all relevant factors into account to ensure reliable estimates and conclusions. Thorough system analysis requires benchmarks capable of testing different system resources at different load levels using multiple workload types. Server energy estimation approaches, on the other hand, require knowledge about the interactions of these factors for the creation of accurate power models. Common approaches to energy-aware workload classification categorize workloads depending on the resource types used by the different workloads. However, they rarely take into account differences in workloads targeting the same resources. Industrial energy-efficiency benchmarks typically do not evaluate the system's energy consumption at different resource load levels, and they only provide data for system analysis at maximum system load. In this paper, we benchmark multiple server configurations using the CPU worklets included in SPEC's Server Efficiency Rating Tool (SERT). We evaluate the impact of load levels and different CPU workloads on power consumption and energy efficiency. We analyze how functions approximating the measured power consumption differ over multiple server configurations and architectures. We show that workloads targeting the same resource can differ significantly in their power draw and energy efficiency. The power consumption of a given workload type varies depending on utilization, hardware and software configuration. The power consumption of CPU-intensive workloads does not scale uniformly with increased load, nor do hardware or software configuration changes affect it in a uniform manner.
Conference Paper
Full-text available
The task of moving from one place to another is a difficult challenge that involves obstacle avoidance, staying on street walks, finding doors, know- ing the current location and keeping on track through the desired path. Nowa- days, navigation systems are widely used to find the correct path, or the quick- est, between two places. While assistive technology has contributed to the im- provement of the quality of life of people with disabilities, people with visual impairment still face enormous limitations in terms of their mobility. In recent years, several approaches have been made to create systems that allow seamless tracking and navigation both in indoor and outdoor environments. However there is still an enormous lack of availability of information that can be used to assist the navigation of users with visual impairments as well as a lack of suffi- cient precision in terms of the estimation of the user’s location. Blavigator is a navigation system designed to help users with visual impairments. In a known location, the use of object recognition algorithms can provide contextual feed- back to the user and even serve as a validator to the positioning module and ge- ographic information system of a navigation system for the visually impaired. This paper proposes a method where the use of computer vision algorithms val- idate the outputs of the positioning system of the Blavigator prototype.
Article
Full-text available
Existing electronic travel aids have not been widely used by blind communities in Thailand due to their low performance, unattractive appearance, impracticality, and high cost. This paper proposes iSonar -- a miniaturized, high performance, and low cost obstacle warning device for the blind. Ultrasonic transducer is used to detect obstacles by providing tactile feedback in different vibration frequencies at head and upper body levels to avoid collision. Our prototype devices have been tested with fifteen blind volunteers with simulated obstacles. Experimental results showed that iSonar reduced collision rates from 33.33 percent to 6.67 percent.
Article
Full-text available
This paper describes a novel multi-frame and multi-slice parallel video encoding approach with simultaneous encoding of predicted frames. The approach, when applied to H.264 encoding, leads to speedups comparable to those obtained by state-of-the-art approaches, but without the disadvantage of requiring bidirectional frames. The new approach uses a number of slices equal or greater than the number of cores used and supports three motion estimation modes. Their combination leads to various tradeoffs between speedup and visual quality loss. For an H.264 baseline profile encoder based on Intel IPP code samples running on a two quad core Xeon system (8 cores in total), our experiments show an average speedup of 7.20×, with an average quality loss of 0.22 dB (compared to a non-parallelized version) for the most efficiency motion estimation mode, and an average speedup of 7.95×, with a quality loss of 1.85 dB for the faster motion estimation mode
Article
Full-text available
Unlike H.264/advanced video coding, where parallelism was an afterthought, High Efficiency Video Coding currently contains several proposals aimed at making it more parallel-friendly. A performance comparison of the different proposals, however, has not yet been performed. In this paper, we will fill this gap by presenting efficient implementations of the most promising parallelization proposals, namely tiles and wavefront parallel processing (WPP). In addition, we present a novel approach called overlapped wavefront (OWF), which achieves higher performance and efficiency than tiles and WPP. Experiments conducted on a 12-core system running at 3.33 GHz show that our implementations achieve average speedups, for 4k sequences, of 8.7, 9.3, and 10.7 for WPP, tiles, and OWF, respectively.
Conference Paper
Full-text available
Orientation and mobility are tremendous problems for Blind people. Assistive technologies based on Global Positioning System (GPS) could provide them with a remarkable autonomy. Unfortunately, GPS accuracy, Geographical Information System (GIS) data and map-matching techniques are adapted to vehicle navigation only, and fail in assisting pedestrian navigation, especially for the Blind. In this paper, we designed an assistive device for the Blind based on adapted GIS, and fusion of GPS and vision based positioning. The proposed assistive device may improve user positioning, even in urban environment where GPS signals are degraded. The estimated position would then be compatible with assisted navigation for the Blind. Interestingly the vision module may also answer Blind needs by providing them with situational awareness (localizing objects of interest) along the path. Note that the solution proposed for positioning could also enhance autonomous robots or vehicles localization.
Conference Paper
Full-text available
In today's Web, Web services are created and updated on the fly. It's already beyond the human ability to analysis them and gen- erate the composition plan manually. A number of approaches have been proposed to tackle that problem. Most of them are inspired by the re- searches in cross-enterprise workflow and AI planning. This paper gives an overview of recent research efforts of automatic Web service compo- sition both from the workflow and AI planning research community.
Conference Paper
Full-text available
Recent technology trends in the Web Services (WS) domain in- dicate that a solution eliminating the presumed complexity of the WS-* standards may be in sight: advocates of REpresentational State Transfer (REST) have come to believe that their ideas ex- plaining why the World Wide Web works are just as applicable to solve enterprise application integration problems and to simplify the plumbing required to build service-oriented architectures. In this paper we objectify the WS-* vs. REST debate by giving a quantitative technical comparison based on architectural principles and decisions. We show that the two approaches differ in the num- ber of architectural decisions that must be made and in the number of available alternatives. This discrepancy between freedom-from- choice and freedom-of-choice explains the complexity difference perceived. However, we also show that there are significant dif- ferences in the consequences of certain decisions in terms of re- sulting development and maintenance costs. Our comparison helps technical decision makers to assess the two integration styles and technologies more objectively and select the one that best fits their needs: REST is well suited for basic, ad hoc integration scenarios, WS-* is more flexible and addresses advanced quality of service requirements commonly occurring in enterprise computing.
Article
Full-text available
Most programs today are written not by professional software developers, but by people with expertise in other domains working towards goals for which they need computational support. For example, a teacher might write a grading spreadsheet to save time grading, or an interaction designer might use an interface builder to test some user interface design ideas. Although these end-user programmers may not have the same goals as professional developers, they do face many of the same software engineering challenges, including understanding their requirements, as well as making decisions about design, reuse, integration, testing, and debugging. This article summarizes and classifies research on these activities, defining the area of End-User Software Engineering (EUSE) and related terminology. The article then discusses empirical research about end-user software engineering activities and the technologies designed to support them. The article also addresses several crosscutting issues in the design of EUSE tools, including the roles of risk, reward, and domain complexity, and self-efficacy in the design of EUSE tools and the potential of educating users about software engineering principles.
Article
Video transcoding is the process of encoding an initial video sequence into multiple sequences of different bitrates, resolutions, and video standards, so that it can be viewed on devices of various capabilities and with various network access characteristics. Because video coding is a computationally expensive process and the amount of video in social-media networks drastically increases every year, large media providers demand for transcoding cloud services will continue rising. This article surveys the state of the art of related cloud services. It also summarizes research on video transcoding and provides indicative results for a transcoding scenario of interest related to Facebook. Finally, it illustrates open challenges in the field and outlines paths for future research.
Conference Paper
Safe navigation and detailed perception in unfamiliar environments are a challenging activity for the blind people. This paper proposes a cloud and vision-based navigation system for the blind. The goal of the system is not only to provide navigation, but also to make the blind people perceive the world in as much detail as possible and live like a normal person. The proposed system includes a helmet molded with stereo cameras in the front, android-based smartphone, web application and cloud computing platform. The cloud computing platform is the core of the system, integrates object detection and recognition, OCR (Optical Character Recognition), speech processing, vision-based SLAM (Simultaneous Localization and Mapping) and path planning, which are all based on deep learning algorithm. The blind people interact with the system in voice. The cloud platform communicates with the smartphone through Wi-Fi or 4G mobile communication technology. For testing the system performance, two groups of tests have been conducted. One is perception and the other is navigation. Test results show that the proposed system can provide more abundant surrounding information and more accurate navigation, and verify the practicability of the newly proposed system.
Article
Service migration between datacenters can reduce the network overhead within a cloud infrastructure; thereby, also improving the quality of service for the clients. Most of the algorithms in the literature assume that the client access pattern remains stable for a sufficiently long period so as to amortize such migrations. However, if such an assumption does not hold, these algorithms can take arbitrarily poor migration decisions that can substantially degrade system performance. In this paper, we approach the issue of performing service migrations for an unknown and dynamically changing client access pattern. We propose an online algorithm that minimizes the inter-datacenter network, taking into account the network load of migrating a service between two datacenters, as well as the fact that the client request pattern may change “quickly”, before such a migration is amortized. We provide a rigorous mathematical proof showing that the algorithm is 3.8-competitive for a cloud network structured as a tree of multiple datacenters. We briefly discuss how the algorithm can be modified to work on general graph networks with an O(log|V|) probabilistic approximation of the optimal algorithm. Finally, we present an experimental evaluation of the algorithm based on extensive simulations.
Conference Paper
People with visual impairments face challenges when navigating indoor environments, such as train stations and shopping malls. Prior approaches either require dedicated hardware that is expensive and bulky or may not be suitable for such complex spaces. This paper aims to propose a practical solution that enables blind travelers to navigate a complex train station independently using a smartphone without the need for any special hardware. Utilizing Bluetooth Low Energy (BLE) technology and a smartphone's built-in compass, we developed StaNavi -- a navigation system that provides turn-by-turn voice directions inside Tokyo Station, one of the world's busiest train stations, which has more than 400,000 passengers daily. StaNavi was iteratively co-designed with blind users to provide features tailored to their needs that include interfaces for one-handed use while walking with a cane and a route overview to provide a picture of the entire journey in advance. It also offers cues that help users orient themselves in convoluted paths or open spaces. A field test with eight blind users demonstrates that all users could reach given destinations in real-life scenarios, showing that our system was effective in a complex and highly crowded environment and has great potential for large-scale deployment.
Article
For various reasons, the cloud computing paradigm is unable to meet certain requirements (e.g. low latency and jitter, context awareness, mobility support) that are crucial for several applications (e.g. vehicular networks, augmented reality). To fulfil these requirements, various paradigms, such as mobile edge computing, fog computing, and mobile cloud computing, have emerged in recent years. While these edge paradigms share several features, most of the existing research is compartmentalised; no synergies have been explored. This is especially true in the field of security, where most analyses focus only on one edge paradigm, while ignoring the others. The main goal of this study is to holistically analyse the security threats, challenges, and mechanisms inherent in all edge paradigms, while highlighting potential synergies and venues of collaboration. In our results, we will show that all edge paradigms should consider the advances in other paradigms.
Article
In this paper, a novel wearable RGB-D camera based indoor navigation system for the visually impaired is presented. The system guides the visually impaired user from one location to another location without a prior map or GPS information. Accurate real-time egomotion estimation, mapping, and path planning in the presence of obstacles are essential for such a system. We perform real-time 6-DOF egomotion estimation using sparse visual features, dense point clouds, and the ground plane to reduce drift from a head-mounted RGB-D camera. The system also builds 2D probabilistic occupancy grid map for efficient traversability analysis which is a basis for dynamic path planning and obstacle avoidance. The system can store and reload maps generated by the system while traveling and continually expand the coverage area of navigation. Next, the shortest path between the start location to destination is generated. The system generates a safe and efficient way point based on the traversability analysis result and the shortest path and updates the way point while a user is constantly moving. Appropriate cues are generated and delivered to a tactile feedback system to guide the visually impaired user to the way point. The proposed wearable system prototype is composed of multiple modules including a head-mounted RGB-D camera, standard laptop that runs a navigation software, smart phone user interface, and haptic feedback vest. The proposed system achieves real-time navigation performance at 28.6Hz in average on a laptop, and helps the visually impaired extends the range of their activities and improve the orientation and mobility performance in a cluttered environment. We have evaluated the performance of the proposed system in mapping and localization with blind-folded and the visually impaired subjects. The mobility experiment results show that navigation in indoor environments with the proposed system avoids collisions successfully and improves mobility performance of the user compared to conventional and state-of-the-art mobility aid devices.
Article
Operator placement plays a key role in reducing the aggregate network overhead within a wireless sensor network (WSN) to extend battery life and the longevity of the network. Consequently, optimal algorithms for the operator placement problem (OPP) are of paramount importance to WSN performance. Unfortunately, the OPP becomes NP-complete when capacity constraints on the WSN nodes are taken into account. There are many algorithms in the literature that tackle the OPP; however, most of them consider tree-structured query graphs without limitations regarding the operators hosted by the WSN nodes. Therefore, there is a need to propose sophisticated approaches such that the problem is solved in an effective fashion. In this paper, we propose a fully distributed approach that takes into account the WSN node capacity constraints. The proposed approach is thoroughly evaluated through simulations and the results reveal that the proposed approach is superior to several state-of-the-art algorithms, such as DRA, DBA, MCFA, dFNS, and GRAL∗ found in the literature.
Article
To program in distributed computing environments such as grids and clouds, workflow is adopted as an attractive paradigm for its powerful ability in expressing a wide range of applications, including scientific computing, multi-tier Web, and big data processing applications. With the development of cloud technology and extensive deployment of cloud platform, the problem of workflow scheduling in cloud becomes an important research topic. The challenges of the problem lie in: NP-hard nature of task-resource mapping; diverse QoS requirements; on-demand resource provisioning; performance fluctuation and failure handling; hybrid resource scheduling; data storage and transmission optimization. Consequently, a number of studies, focusing on different aspects, emerged in the literature. In this paper, we firstly conduct taxonomy and comparative review on workflow scheduling algorithms. Then, we make a comprehensive survey of workflow scheduling in cloud environment in a problem–solution manner. Based on the analysis, we also highlight some research directions for future investigation.
Article
Although the High Efficiency Video Coding (HEVC) standard significantly improves the coding efficiency of video compression, it is unacceptable even in offline applications to spend several hours compressing 10 s of high-definition video. In this paper, we propose using a multicore central processing unit (CPU) and an off-the-shelf graphics processing unit (GPU) with 3072 streaming processors (SPs) for HEVC fast encoding, so that the speed optimization does not result in loss of coding efficiency. There are two key technical contributions in this paper. First, we propose an algorithm that is both parallel and fast for the GPU, which can utilize 3072 SPs in parallel to estimate the motion vector (MV) of every prediction unit (PU) in every combination of the coding unit (CU) and PU partitions. Furthermore, the proposed GPU algorithm can avoid coding efficiency loss caused by the lack of a MV predictor (MVP). Second, we propose a fast algorithm for the CPU, which can fully utilize the results from the GPU to significantly reduce the number of possible CU and PU partitions without any coding efficiency loss. Our experimental results show that compared with the reference software, we can encode high-resolution video that consumes 1.9% of the CPU time and 1.0% of the GPU time, with only a 1.4% rate increase.
Conference Paper
Workflow application can be executed in cloud computing environments in utility-based fashion. In our paper, we first give a survey of cloud workflow application and present the cloud-based workflow architecture for Smart City. Then a variety of workflow scheduling algorithms are reviewed. The purpose of this paper is to making taxonomy for workflow management and scheduling in cloud environment, and also applying this cloud-based workflow architecture to Smart City environments, further presenting several research challenges in this area.
Conference Paper
Google has recently finalized a next generation open-source video codec called VP9, as part of the libvpx repository of the WebM project (http://www.webmproject.org/). Starting from the VP8 video codec released by Google in 2010 as the baseline, various enhancements and new tools were added, resulting in the next-generation VP9 bit-stream. This paper provides a brief technical overview of VP9 along with comparisons with other state-of-the-art video codecs H.264/AVC and HEVC on standard test sets. Results show VP9 to be quite competitive with mainstream state-of-the-art codecs.
Article
This paper introduces a new portable camera-based method for helping blind people to recognize indoor objects. Unlike the state-of-the-art techniques, which typically perform the recognition task by limiting it to a single predefined class of objects, we propose here a completely different alternative scheme, defined as coarse description. It aims at expanding the recognition task to multiple objects and, at the same time, keeping the processing time under control by sacrificing some information details. The benefit is to increment the awareness and the perception of a blind person to his direct contextual environment. The coarse description issue is addressed via two image multilabeling strategies which differ in the way image similarity is computed. The first one makes use of the Euclidean distance measure, while the second one relies on a semantic similarity measure modeled by means of Gaussian process (GP) estimation. In order to achieve fast computation capability, both strategies rely on a compact image representation based on compressive sensing. The proposed methodology was assessed on two indoor datasets representing different indoor environments. Encouraging results were achieved in terms of both accuracy and processing time.
Article
High Efficiency Video Coding (HEVC) is currently being prepared as the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards-in the range of 50% bit-rate reduction for equal perceptual video quality. This paper provides an overview of the technical features and characteristics of the HEVC standard.
Article
We present a study on grocery detection using our ob-ject detection system, ShelfScanner, which seeks to allow a visually impaired user to shop at a grocery store without ad-ditional human assistance. ShelfScanner allows online de-tection of items on a shopping list, in video streams in which some or all items could appear simultaneously. To deal with the scale of the object detection task, the system leverages the approximate planarity of grocery store shelves to build a mosaic in real time using an optical flow algorithm. The system is then free to use any object detection algorithm without incurring a loss of data due to processing time. For purposes of speed we use a multiclass naive-Bayes clas-sifier inspired by NIMBLE, which is trained on enhanced SURF descriptors extracted from images in the GroZi-120 dataset. It is then used to compute per-class probability distributions on video keypoints for final classification. Our results suggest ShelfScanner could be useful in cases where high-quality training data is available.
Conference Paper
Nowadays computer vision technology is helping the visually impaired by recognizing objects in their surroundings. Unlike research of navigation and wayfinding, there are no camera-based systems available in the market to find personal items for the blind. This paper proposes an object recognition method to help blind people find missing items using Speeded-Up Robust Features (SURF). SURF features can extract distinctive invariant features that can be utilized to perform reliable matching between different images in multiple scenarios. These features are invariant to image scale, translation, rotation, illumination, and partial occlusion. The proposed recognition process begins by matching individual features of the user queried object to a database of features with different personal items which are saved in advance. Experiment results demonstrate the effectiveness and efficiency of the proposed method.
Article
H.264/AVC is newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goals of the H.264/AVC standardization effort have been enhanced compression performance and provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "nonconversational" (storage, broadcast, or streaming) applications. H.264/AVC has achieved a significant improvement in rate-distortion efficiency relative to existing standards. This article provides an overview of the technical features of H.264/AVC, describes profiles and applications for the standard, and outlines the history of the standardization process.
Common test conditions and software reference configurations
  • F Bossen
Encoding, fast and slow: Low-latency video processing using thousands of tiny threads
  • S Fouladi
  • R S Wahby
  • B Shacklett
  • K Balasubramaniam
  • W Zeng
  • R Bhalerao
  • K Winstein
Fouladi, S., Wahby, R.S., Shacklett, B., Balasubramaniam, K., Zeng, W., Bhalerao, R., Sivaraman, A., Porter, G., Winstein, K.: Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In: 2017 Symposium on Networked Systems Design and Implementation (NSDI). pp. 363-376 (2017).
Blind people guidance system
  • L Šimunovi
  • V Aneli
  • I Pavlinuši
Šimunovi, L., Aneli, V., Pavlinuši, I.: Blind people guidance system. In: International Conference, Central European Conference Information Intelligent Systems. (2012).
Parallel Scalability and Efficiency of HEVC Parallelization Approaches. IEEE Transactions on Circuits and Systems for Video Technology
  • C C Chi
  • M Alvarez-Mesa
  • B Juurlink
  • G Clare
  • F Henry
  • S Pateux
  • T Schierl
Chi, C.C., Alvarez-Mesa, M., Juurlink, B., Clare, G., Henry, F., Pateux, S., Schierl, T.: Parallel Scalability and Efficiency of HEVC Parallelization Approaches. IEEE Transactions on Circuits and Systems for Video Technology. 22, 1827-1838 (2012).
Versatile Video Coding
  • Vvc