Yong Liu

Yong Liu
Zhejiang University | ZJU · Institute of Cyber-Systems and Control

Doctor of Philosophy

About

252
Publications
35,848
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,683
Citations
Introduction
Yong Liu currently works at the Institute of Cyber-Systems and Control , Zhejiang University. Yong does research in Artificial Intelligence, Data Mining and Artificial Neural Network. Their current project is 'Visual Object Tracking'.
Additional affiliations
January 2017 - present
Zhejiang University
Position
  • Professor
January 2011 - December 2016
Zhejiang University
Position
  • Professor (Associate)

Publications

Publications (252)
Article
Full-text available
Scene classification is a fundamental perception task for environmental understanding in today's robotics. In this paper, we have attempted to exploit the use of popular machine learning technique of deep learning to enhance scene understanding, particularly in robotics applications. As scene images have larger diversity than the iconic object imag...
Article
In this paper, we present a stereo visual-inertial odometry (VIO) algorithm assembled with three separated Kalman filters, i.e. attitude filter, orientation filter, and position filter. Our algorithm carries out the orientation and position estimation with three filters working on different fusion intervals, which can provide more robustness even w...
Conference Paper
Structured output support vector machine (SVM) based tracking algorithms have shown favorable performance recently. Nonetheless, the time-consuming candidate sampling and complex optimization limit their real-time applications. In this paper, we propose a novel large margin object tracking method which absorbs the strong discriminative ability from...
Article
Full-text available
In this paper, we develop a robust efficient visual SLAM system that utilizes heterogeneous point and line features. By leveraging ORB-SLAM [1], the proposed system consists of stereo matching, frame tracking, local mapping, loop detection, and bundle adjustment of both point and line features. In particular, as the main theoretical contributions o...
Article
The task of multi-object tracking via deep learning methods for UAV videos has become an important research direction. However, with some current multiple object tracking methods, the relationship between object detection and tracking is not well handled, and decisions on how to make good use of temporal information can affect tracking performance...
Preprint
Poles and building edges are frequently observable objects on urban roads, conveying reliable hints for various computer vision tasks. To repetitively extract them as features and perform association between discrete LiDAR frames for registration, we propose the first learning-based feature segmentation and description model for 3D lines in LiDAR p...
Article
Full-text available
In TBM (Tunnel Boring Machine) construction process, the rock size analysis system plays an important role in assisting driving. Its core algorithm is based on semantic segmentation, and it brings challenges to dataset acquisition in real applications. To relieve this problem, this paper proposes a virtual-realistic fused dataset, short for ViRFD....
Preprint
In this paper, we introduce DA$^2$, the first large-scale dual-arm dexterity-aware dataset for the generation of optimal bimanual grasping pairs for arbitrary large objects. The dataset contains about 9M pairs of parallel-jaw grasps, generated from more than 6000 objects and each labeled with various grasp dexterity measures. In addition, we propos...
Article
Full-text available
Multi-attribute decision making (MADM) with attribute values as interval-valued intuitionistic fuzzy numbers (IVIFNs) is essentially a second-order decision making problem with uncertainty. To this end, the partial connection number (PCN) of set pair analysis is applied to MADM with IVIFNs. The PCN is an adjoin function of the connection number (CN...
Preprint
Full-text available
Recently, the image-wise implicit neural representation of videos, NeRV, has gained popularity for its promising results and swift speed compared to regular pixel-wise implicit representations. However, the redundant parameters within the network structure can cause a large model size when scaling up for desirable performance. The key reason of thi...
Article
In the practical application of restoring low-resolution gray-scale images, we generally need to run three separate processes of image colorization, super-resolution, and dows-sampling operation for the target device. However, this pipeline is redundant and inefficient for the independent processes, and some inner features could have been shared. T...
Preprint
Motivated by biological evolution, this paper explains the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) and derives that both have consistent mathematical formulation. Then inspired by effective EA variants, we propose a novel pyramid EATFormer backbone that only contains the proposed \emph{EA-b...
Preprint
Full-text available
Accurate and reliable sensor calibration is essential to fuse LiDAR and inertial measurements, which are usually available in robotic applications. In this paper, we propose a novel LiDAR-IMU calibration method within the continuous-time batch-optimization framework, where the intrinsics of both sensors and the spatial-temporal extrinsics between s...
Article
Surrogate models are widely used to model the high computational cost problems such as industrial simulation or engineering optimization when the size of sampled data for modeling is greatly limited. They can significantly improve the efficiency of complex calculations by modeling original expensive problems with simpler computation-saving function...
Article
Full-text available
Achieving efficient and accurate feature tracking on event cameras is a fundamental step for practical high-level applications, such as simultaneous localization and mapping (SLAM) and structure from motion (SfM) and visual odometry (VO) in GNSS (Global Navigation Satellite System)-denied environments. Although many asynchronous tracking methods pu...
Article
Full-text available
LiDAR-based place recognition (LPR) is one of the basic capabilities of robots, which can retrieve scenes from maps and identify previously visited locations based on 3D point clouds. As robots often pass the same place from different views, LPR methods are supposed to be robust to rotation, which is lacking in most current learning-based approache...
Article
Full-text available
As one of the key technologies of SLAM, loop-closure detection can help eliminate the cumulative errors of the odometry. Many of the current LiDAR-based SLAM systems do not integrate a loop-closure detection module, so they will inevitably suffer from cumulative errors. This paper proposes a semantic-based place recognition method called Semantic S...
Preprint
This paper presents a novel Region-Aware Face Swapping (RAFSwap) network to achieve identity-consistent harmonious high-resolution face generation in a local-global manner: \textbf{1)} Local Facial Region-Aware (FRA) branch augments local identity-relevant features by introducing the Transformer to effectively model misaligned cross-scale semantic...
Preprint
Density-based and classification-based methods have ruled unsupervised anomaly detection in recent years, while reconstruction-based methods are rarely mentioned for the poor reconstruction ability and low performance. However, the latter requires no costly extra training samples for the unsupervised training that is more practical, so this paper f...
Article
In this paper, we propose a novel traffic flow prediction approach, called as Graph Diffusing trans-Former (GDFormer). GDFormer is in architecture of transformer, which is composed by the encoder sequence and decoder sequence. both of the encoder sequence and decoder sequence in GDFormer are constituted by the novel designed Graph Diffusing Attenti...
Preprint
In the practical application of restoring low-resolution gray-scale images, we generally need to run three separate processes of image colorization, super-resolution, and dows-sampling operation for the target device. However, this pipeline is redundant and inefficient for the independent processes, and some inner features could have been shared. T...
Preprint
Photon-efficient imaging with the single-photon LiDAR captures the 3D structure of a scene by only a few detected signal photons per pixel. However, the existing computational methods for photon-efficient imaging are pre-tuned on a restricted scenario or trained on simulated datasets. When applied to realistic scenarios whose signal-to-background r...
Preprint
Full-text available
Single-photon light detection and ranging (LiDAR) has been widely applied to 3D imaging in challenging scenarios. However, limited signal photon counts and high noises in the collected data have posed great challenges for predicting the depth image precisely. In this paper, we propose a pixel-wise residual shrinkage network for photon-efficient ima...
Preprint
In a Riemannian manifold, the Ricci flow is a partial differential equation for evolving the metric to become more regular. We hope that topological structures from such metrics may be used to assist in the tasks of machine learning. However, this part of the work is still missing. In this paper, we bridge this gap between the Ricci flow and deep n...
Preprint
Full-text available
In spite of the success on benchmark datasets, most advanced face super-resolution models perform poorly in real scenarios since the remarkable domain gap between the real images and the synthesized training pairs. To tackle this problem, we propose a novel domain-adaptive degradation network for face super-resolution in the wild. This degradation...
Preprint
Loss functions play an important role in training deep-network-based object detectors. The most widely used evaluation metric for object detection is Average Precision (AP), which captures the performance of localization and classification sub-tasks simultaneously. However, due to the non-differentiable nature of the AP metric, traditional object d...
Article
Full-text available
Deep neural networks (DNNs) have strong fitting ability on a variety of computer vision tasks, but they also require intensive computing power and large storage space, which are not always available in portable smart devices. Although a lot of studies have contributed to the compression of image classification networks, there are few model compress...
Article
Full-text available
To objectively evaluate the influence of hesitant fuzziness on the ranking of alternatives in multi-attribute decision making with hesitant fuzzy or probabilistic hesitant fuzzy information, the binary connection number of set pair analysis is applied to hesitant fuzzy multi-attribute decision making. The hesitant or probabilistic hesitant fuzzy se...
Article
Multi-Agent Path Finding has been widely studied in the past few years due to its broad application in the field of robotics and AI. However, previous solvers rely on several simplifying assumptions. This limits their applicability in numerous real-world domains that adopt nonholonomic car-like agents rather than holonomic ones. In this paper, we g...
Article
This paper presents an algorithm that generates distributed collision-free velocities for multi-robot while maintain formation as much as possible. The adaptive formation problem is cast as a sequential decision-making problem, which is solved using reinforcement learning that trains several distributed policies to avoid dynamic obstacles on the to...
Preprint
Full-text available
Referring image segmentation is a typical multi-modal task, which aims at generating a binary mask for referent described in given language expressions. Prior arts adopt a bimodal solution, taking images and languages as two modalities within an encoder-fusion-decoder pipeline. However, this pipeline is sub-optimal for the target task for two reaso...
Preprint
The Ricci flow is a partial differential equation for evolving the metric in a Riemannian manifold to make it more regular. However, in most cases, the Ricci flow tends to develop singularities and lead to divergence of the solution. In this paper, we propose the linearly nearly Euclidean metric to assist manifold micro-surgery, which means that we...
Article
Audio-guided face reenactment aims to generate authentic target faces that have matched facial expression of the input audio, and many learning-based methods have successfully achieved this. However, most methods can only reenact a particular person once trained or suffer from the low-quality generation of the target images. Also, nearly none of th...
Preprint
Full-text available
Outdoor scene completion is a challenging issue in 3D scene understanding, which plays an important role in intelligent robotics and autonomous driving. Due to the sparsity of LiDAR acquisition, it is far more complex for 3D scene completion and semantic segmentation. Since semantic features can provide constraints and semantic priors for completio...
Preprint
The canonical approach to video action recognition dictates a neural model to do a classic and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined categories, limiting their transferable ability on new datasets with unseen concepts. In this paper, we provide a new perspective on action recognition by attaching...
Preprint
Full-text available
In this paper, we propose a highly accurate continuous-time trajectory estimation framework dedicated to SLAM (Simultaneous Localization and Mapping) applications, which enables fuse high-frequency and asynchronous sensor data effectively. We apply the proposed framework in a 3D LiDAR-inertial system for evaluations. The proposed method adopts a no...
Article
Filter pruning is a significant feature selection technique to shrink the existing feature fusion schemes (especially on convolution calculation and model size), which helps to develop more efficient feature fusion models while maintaining state-of-the-art performance. In addition, it reduces the storage and computation requirements of deep neural...
Preprint
Remarkable results have been achieved by DCNN based self-supervised depth estimation approaches. However, most of these approaches can only handle either day-time or night-time images, while their performance degrades for all-day images due to large domain shift and the variation of illumination between day and night images. To relieve these limita...
Preprint
Full-text available
Place recognition gives a SLAM system the ability to correct cumulative errors. Unlike images that contain rich texture features, point clouds are almost pure geometric information which makes place recognition based on point clouds challenging. Existing works usually encode low-level features such as coordinate, normal, reflection intensity, etc.,...
Preprint
Full-text available
LiDAR-based SLAM system is admittedly more accurate and stable than others, while its loop closure detection is still an open issue. With the development of 3D semantic segmentation for point cloud, semantic information can be obtained conveniently and steadily, essential for high-level intelligence and conductive to SLAM. In this paper, we present...
Article
This paper proposes a valve stiction detection strategy based on convolutional neural network (CNN). Considering the commonly existed characteristics of industrial time series signals, the strategy is developed to learn features on multiple timescales automatically. Unlike the traditional approaches using hand-crafted features, the proposed strateg...
Article
Full-text available
Recent works in the person re-identification task mainly focus on the model accuracy while ignoring factors related to efficiency, e.g., model size and latency, which are critical for practical application. In this paper, we propose a novel Hierarchical and Efficient Network (HENet) that learns hierarchical global, partial, and recovery features en...
Article
Full-text available
This paper mainly focuses on the volume calculation of materials in the warehouse where sand and gravel materials are stored and monitored whether materials are lacking in real-time. Specifically, we proposed the sandpile model and the point cloud projection obtained from the LiDAR sensors to calculate the material volume. We use distributed edge c...
Article
Full-text available
The measurement accuracy of wind direction and wind speed is very important to the unmanned sailboat control, but the mature mechanical wind sensor and ultrasonic wind sensor both have great defects to be applied to the unmanned sailboat. Inspired by previous works on neural networks, we propose a low-cost, real-time, and robust wind measurement sy...
Preprint
Recently, Space-Time Memory Network (STM) based methods have achieved state-of-the-art performance in semi-supervised video object segmentation (VOS). A critical problem in this task is how to model the dependency both among different frames and inside every frame. However, most of these methods neglect the spatial relationships (inside each frame)...
Preprint
Inspired by biological evolution, we explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) and derive that both of them have consistent mathematical representation. Analogous to the dynamic local population in EA, we improve the existing transformer structure and propose a more efficient EAT...
Article
Quadruped robots have superior terrain adaptability and flexible movement capabilities than traditional robots. In this paper, we innovatively apply it in person-following tasks, and propose an efficient motion planning scheme for quadruped robots to generate a flexible and effective trajectory in confined spaces. The method builds a real-time loca...