Kailun Yang

Kailun Yang
Karlsruhe Institute of Technology | KIT · Institute for Anthropomatics and Robotics

PostDoctoral Researcher

About

136
Publications
48,836
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,423
Citations

Publications

Publications (136)
Article
Full-text available
The introduction of RGB-Depth (RGB-D) sensors into the visually impaired people (VIP)-assisting area has stirred great interest of many researchers. However, the detection range of RGB-D sensors is limited by narrow depth field angle and sparse depth map in the distance, which hampers broader and longer traversability awareness. This paper proposes...
Article
Full-text available
Introduction of RGB-D sensors is a revolutionary force that offers a portable, versatile and cost-effective solution of navigational assistance for the visually impaired. RGB-D sensors on the market such as Microsoft Kinect, Asus Xtion and Intel RealSense are mature products, but all have a minimum detecting distance of about 800 mm. This results i...
Article
Full-text available
Navigational assistance aims to help visually-impaired people to ambulate the environment safely and independently. This topic becomes challenging as it requires detecting a wide variety of scenes to provide higher level assistive awareness. Vision-based technologies with monocular detectors or depth sensors have sprung up within several years of r...
Preprint
In this work, we introduce panoramic panoptic segmentation, as the most holistic scene understanding, both in terms of Field of View (FoV) and image-level understanding for standard camera-based input. A complete surrounding understanding provides a maximum of information to a mobile agent, which is essential for any intelligent vehicle in order to...
Preprint
Full-text available
Panoramic Annular Lens (PAL), composed of few lenses, has great potential in panoramic surrounding sensing tasks for mobile and wearable devices because of its tiny size and large Field of View (FoV). However, the image quality of tiny-volume PAL confines to optical limit due to the lack of lenses for aberration correction. In this paper, we propos...
Preprint
Full-text available
Human Pose Estimation (HPE) based on RGB images has experienced a rapid development benefiting from deep learning. However, event-based HPE has not been fully studied, which remains great potential for applications in extreme scenes and efficiency-critical conditions. In this paper, we are the first to estimate 2D human pose directly from 3D event...
Preprint
Full-text available
With the rapid development of high-speed communication and artificial intelligence technologies, human perception of real-world scenes is no longer limited to the use of small Field of View (FoV) and low-dimensional scene detection devices. Panoramic imaging emerges as the next generation of innovative intelligent instruments for environmental perc...
Thesis
Full-text available
Exploring an unfamiliar indoor environment and avoiding obstacles is challenging for vi- sually impaired people. Currently, several assistive applications for visual impaired people combined with Simultaneous Localization And Mapping (SLAM) technology and seman- tic segmentation technologies achieve localization of users, obstacle detection, destin...
Preprint
Full-text available
Structured Visual Content (SVC) such as graphs, flow charts, or the like are used by authors to illustrate various concepts. While such depictions allow the average reader to better understand the contents, images containing SVCs are typically not machine-readable. This, in turn, not only hinders automated knowledge aggregation, but also the percep...
Preprint
Full-text available
Driver observation models are rarely deployed under perfect conditions. In practice, illumination, camera placement and type differ from the ones present during training and unforeseen behaviours may occur at any time. While observing the human behind the steering wheel leads to more intuitive human-vehicle-interaction and safer driving, it require...
Preprint
Full-text available
Exploring an unfamiliar indoor environment and avoiding obstacles is challenging for visually impaired people. Currently, several approaches achieve the avoidance of static obstacles based on the mapping of indoor scenes. To solve the issue of distinguishing dynamic obstacles, we propose an assistive system with an RGB-D sensor to detect dynamic in...
Preprint
Full-text available
Autonomous vehicles utilize urban scene segmentation to understand the real world like a human and react accordingly. Semantic segmentation of normal scenes has experienced a remarkable rise in accuracy on conventional benchmarks. However, a significant portion of real-life accidents features abnormal scenes, such as those with object deformations,...
Preprint
Local feature matching is a computationally intensive task at the subpixel level. While detector-based methods coupled with feature descriptors struggle in low-texture scenes, CNN-based methods with a sequential extract-to-match pipeline, fail to make use of the matching capacity of the encoder and tend to overburden the decoder for matching. In co...
Preprint
The performance of semantic segmentation of RGB images can be advanced by exploiting informative features from supplementary modalities. In this work, we propose CMX, a vision-transformer-based cross-modal fusion framework for RGB-X semantic segmentation. To generalize to different sensing modalities encompassing various uncertainties, we consider...
Preprint
Traditional video-based human activity recognition has experienced remarkable progress linked to the rise of deep learning, but this effect was slower as it comes to the downstream task of driver behavior understanding. Understanding the situation inside the vehicle cabin is essential for Advanced Driving Assistant System (ADAS) as it enables ident...
Preprint
Full-text available
Panoramic images with their 360-degree directional view encompass exhaustive information about the surrounding space, providing a rich foundation for scene understanding. To unfold this potential in the form of robust panoramic segmentation models, large quantities of expensive, pixel-wise annotations are crucial for success. Such annotations are a...
Preprint
Full-text available
Visual-inertial-odometry has attracted extensive attention in the field of autonomous driving and robotics. The size of Field of View (FoV) plays an important role in Visual-Odometry (VO) and Visual-Inertial-Odometry (VIO), as a large FoV enables to perceive a wide range of surrounding scene elements and features. However, when the field of the cam...
Preprint
For scene understanding in robotics and automated driving, there is a growing interest in solving semantic segmentation tasks with transformer-based methods. However, effective transformers are always too cumbersome and computationally expensive to solve semantic segmentation in real time, which is desired for robotic systems. Moreover, due to the...
Preprint
Full-text available
Optical flow estimation is a basic task in self-driving and robotics systems, which enables to temporally interpret the traffic scene. Autonomous vehicles clearly benefit from the ultra-wide Field of View (FoV) offered by 360-degree panoramic sensors. However, due to the unique imaging process of panoramic images, models designed for pinhole images...
Preprint
Full-text available
Automatically understanding human behaviour allows household robots to identify the most critical needs and plan how to assist the human according to the current situation. However, the majority of such methods are developed under the assumption that a large amount of labelled training examples is available for all concepts-of-interest. Robots, on...
Preprint
Full-text available
Optical flow estimation is an essential task in self-driving systems, which helps autonomous vehicles perceive temporal continuity information of surrounding scenes. The calculation of all-pair correlation plays an important role in many existing state-of-the-art optical flow estimation methods. However, the reliance on local knowledge often limits...
Preprint
Full-text available
We explore the problem of automatically inferring the amount of kilocalories used by human during physical activity from his/her video observation. To study this underresearched task, we introduce Vid2Burn -- an omni-source benchmark for estimating caloric expenditure from video data featuring both, high- and low-intensity activities for which we d...
Article
Full-text available
The robustness of semantic segmentation on edge cases of traffic scene is a vital factor for the safety of intelligent transportation. However, most of the critical scenes of traffic accidents are extremely dynamic and previously unseen, which seriously harm the performance of semantic segmentation methods. In addition, the delay of the traditional...
Preprint
Full-text available
The robustness of semantic segmentation on edge cases of traffic scene is a vital factor for the safety of intelligent transportation. However, most of the critical scenes of traffic accidents are extremely dynamic and previously unseen, which seriously harm the performance of semantic segmentation methods. In addition, the delay of the traditional...
Preprint
Full-text available
Human affect recognition is a well-established research area with numerous applications, e.g., in psychological care, but existing methods assume that all emotions-of-interest are given a priori as annotated training examples. However, the rising granularity and refinements of the human emotional spectrum through novel psychological theories and th...
Preprint
Full-text available
Traditional frame-based cameras inevitably suffer from motion blur due to long exposure times. As a kind of bio-inspired camera, the event camera records the intensity changes in an asynchronous way with high temporal resolution, providing valid image degradation information within the exposure time. In this paper, we rethink the event-based image...
Article
Full-text available
Autonomous vehicles clearly benefit from the expanded Field of View (FoV) of 360° sensors, but modern semantic segmentation approaches rely heavily on annotated training data which is rarely available for panoramic images. We look at this problem from the perspective of domain adaptation and bring panoramic semantic segmentation to a setting, where...
Preprint
Full-text available
Autonomous vehicles clearly benefit from the expanded Field of View (FoV) of 360-degree sensors, but modern semantic segmentation approaches rely heavily on annotated training data which is rarely available for panoramic images. We look at this problem from the perspective of domain adaptation and bring panoramic semantic segmentation to a setting,...
Preprint
Full-text available
Transparent objects, such as glass walls and doors, constitute architectural obstacles hindering the mobility of people with low vision or blindness. For instance, the open space behind glass doors is inaccessible, unless it is correctly perceived and interacted with. However, traditional assistive technologies rarely cover the segmentation of thes...
Preprint
Full-text available
Depth estimation, as a necessary clue to convert 2D images into the 3D space, has been applied in many machine vision areas. However, to achieve an entire surrounding 360-degree geometric sensing, traditional stereo matching algorithms for depth estimation are limited due to large noise, low accuracy, and strict requirements for multi-camera calibr...
Article
Full-text available
Depth estimation, as a necessary clue to convert 2D images into the 3D space, has been applied in many machine vision areas. However, to achieve an entire surrounding 360° geometric sensing, traditional stereo matching algorithms for depth estimation are limited due to large noise, low accuracy, and strict requirements for multi-camera calibration....
Preprint
Full-text available
Lacking the ability to sense ambient environments effectively, blind and visually impaired people (BVIP) face difficulty in walking outdoors, especially in urban areas. Therefore, tools for assisting BVIP are of great importance. In this paper, we propose a novel "flying guide dog" prototype for BVIP assistance using drone and street view semantic...
Preprint
Full-text available
Intelligent vehicles clearly benefit from the expanded Field of View (FoV) of the 360-degree sensors, but the vast majority of available semantic segmentation training images are captured with pinhole cameras. In this work, we look at this problem through the lens of domain adaptation and bring panoramic semantic segmentation to a setting, where la...
Conference Paper
Full-text available
Classic computer vision algorithms, instance seg-mentation, and semantic segmentation can not provide a holistic understanding of the surroundings for the visually impaired. In this paper, we utilize panoptic segmentation to assist the navigation of visually impaired people by offering both things and stuff awareness in the proximity of the visuall...
Preprint
Full-text available
Independently exploring unknown spaces or finding objects in an indoor environment is a daily but challenging task for visually impaired people. However, common 2D assistive systems lack depth relationships between various objects, resulting in difficulty to obtain accurate spatial layout and relative positions of objects. To tackle these issues, w...
Preprint
Full-text available
Common fully glazed facades and transparent objects present architectural barriers and impede the mobility of people with low vision or blindness, for instance, a path detected behind a glass door is inaccessible unless it is correctly perceived and reacted. However, segmenting these safety-critical objects is rarely covered by conventional assisti...
Preprint
Full-text available
At the heart of all automated driving systems is the ability to sense the surroundings, e.g., through semantic segmentation of LiDAR sequences, which experienced a remarkable progress due to the release of large datasets such as SemanticKITTI and nuScenes-LidarSeg. While most previous works focus on sparse segmentation of the LiDAR input, dense out...
Article
Full-text available
In this paper, we propose panoramic annular simultaneous localization and mapping (PA-SLAM), a visual SLAM system based on a panoramic annular lens. A hybrid point selection strategy is put forward in the tracking front end, which ensures repeatability of key points and enables loop closure detection based on the bag-of-words approach. Every detect...
Article
Full-text available
Scene sonification is a powerful technique to help Visually Impaired People (VIP) understand their surroundings. Existing methods usually perform sonification on the entire images of the surrounding scene acquired by a standard camera or on the priori static obstacles acquired by image processing algorithms on the RGB image of the surrounding scene...
Preprint
Full-text available
Aerial pixel-wise scene perception of the surrounding environment is an important task for UAVs (Unmanned Aerial Vehicles). Previous research works mainly adopt conventional pinhole cameras or fisheye cameras as the imaging device. However, these imaging systems cannot achieve large Field of View (FoV), small size, and lightweight at the same time....
Preprint
Full-text available
Convolutional Networks (ConvNets) excel at semantic segmentation and have become a vital component for perception in autonomous driving. Enabling an all-encompassing view of street-scenes, omnidirectional cameras present themselves as a perfect fit in such systems. Most segmentation models for parsing urban environments operate on common, narrow Fi...
Preprint
Full-text available
Classic computer vision algorithms, instance segmentation, and semantic segmentation can not provide a holistic understanding of the surroundings for the visually impaired. In this paper, we utilize panoptic segmentation to assist the navigation of visually impaired people by offering both things and stuff awareness in the proximity of the visually...
Preprint
Full-text available
As the scene information, including objectness and scene type, are important for people with visual impairment, in this work we present a multi-task efficient perception system for the scene parsing and recognition tasks. Building on the compact ResNet backbone, our designed network architecture has two paths with shared parameters. In the structur...
Preprint
Full-text available
Street scene change detection continues to capture researchers' interests in the computer vision community. It aims to identify the changed regions of the paired street-view images captured at different times. The state-of-the-art network based on the encoder-decoder architecture leverages the feature maps at the corresponding level between two cha...
Preprint
Full-text available
In this work, we introduce panoramic panoptic segmentation as the most holistic scene understanding both in terms of field of view and image level understanding. A complete surrounding understanding provides a maximum of information to the agent, which is essential for any intelligent vehicle in order to make informed decisions in a safety-critical...