About
90
Publications
49,198
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
891
Citations
Introduction
I am an associate professor with the College of Computer, NUDT, China. I received my Ph.D. degree in computer science from the University of Tübingen, Germany, in 2014. My current research interests mainly include swarm intelligence, task alloacation, SLAM, and active perception. I serve as reviewer or PC member of various robotic journals and conferences. My work can be found at
https://scholar.google.com/citations?user=2swWUxwAAAAJ
Additional affiliations
Education
October 2010 - August 2014
February 2010 - September 2010
September 2007 - December 2009
Publications
Publications (90)
In this paper, we present an onboard monocular vision system for autonomous takeoff, hovering and landing of a Micro Aerial Vehicle (MAV). Since pose information with metric scale is critical for autonomous flight of a MAV, we present a novel solution to six degrees of freedom (DOF) pose estimation. It is based on a single image of a typical landin...
This paper presents a novel solution for micro aerial vehicles (MAVs) to autonomously search for and land on an arbitrary landing site using real-time monocular vision. The autonomous MAV is provided with only one single reference image of the landing site with an unknown size before initiating this task. We extend a well-known monocular visual SLA...
This paper extends a monocular visual simultaneous localization and mapping (SLAM) system to utilize two cameras with non-overlap in their respective field of views (FOVs). We achieve using it to enable autonomous navigation of a micro aerial vehicle (MAV) in unknown environments. The methodology behind this system can easily be extended to multi-c...
In this letter, we introduce PolarMesh as a star-convex approximation of a 3D object based on spherical projection and can be applied to monocular object pose and shape estimation. The proposed PolarMesh can be stored in a discrete 2D map that allows a trivial conversion between it and the object surface, as the adjacent map elements form faces of...
Dragon boat racing, a popular aquatic folklore team sport, is traditionally held during the Dragon Boat Festival. Inspired by this event, we propose a novel human‐based meta‐heuristic algorithm called dragon boat optimization (DBO) in this paper. It models the unique behaviours of each crew member on the dragon boat during the race by introducing s...
Collaboration in Multi-Agent Systems (MASs) is crucial but challenging in robotics, especially in heterogeneous MASs where robots have different capabilities. Nowadays, the key issue in research on collaboration in MASs is to fully utilize the capabilities of heterogeneous agents. To address this issue, we propose Auction-Based Behavior Tree Evolut...
Adversarial patch attacks are commonly used to attack aerial imagery object detectors. However, most existing methods are designed for white-box settings, which are impractical in real world scenarios. In this work, we propose a novel framework for black-box adversarial patch attacks against aerial imagery object detectors using differential evolut...
Leveraging the open-world understanding capacity of large-scale visual-language pre-trained models has become a hot spot in point cloud classification. Recent approaches rely on transferable visual-language pre-trained models, classifying point clouds by projecting them into 2D images and evaluating consistency with textual prompts. These methods b...
Dragon Boat Racing, a popular aquatic folklore team sport, is traditionally held during the Dragon Boat Festival. Inspired by this event, we propose a novel human-based meta-heuristic algorithm called dragon boat optimization (DBO) in this paper. It models the unique behaviors of each crew member on the dragon boat during the race by introducing so...
Prompt tuning provides a low-cost way of adapting vision-language models (VLMs) for various downstream vision tasks without requiring updating the huge pre-trained parameters. Dispensing with the conventional manual crafting of prompts, the recent prompt tuning method of Context Optimization (CoOp) introduces adaptable vectors as text prompts. Neve...
Acquiring medical images while maintaining patient information confidentiality is a difficult task, which leads to a lack of sufficient data for deep learning-based disease detection. To address the challenges of few-shot learning in such scenarios, researchers generally resort to the transfer learning ability of the pre-train model. However, in pr...
Named Entity Recognition (NER) task aims to identify named entities from unstructured text and classify them into corresponding entity types. Existing pretraining models typically utilize BERT models to learn word embeddings at the character level, disregarding the semantic relationships between phrases. They also pay less attention to long-distanc...
Leveraging the open-world understanding capacity of large-scale visual-language pre-trained models has become a hot-spot in point cloud classification. Recent approaches rely on transferable visual-language pre-trained models, classifying point clouds by projecting them into 2D images and evaluating consistency with textual prompts. These methods b...
Classifying and accurately locating a visual category with few annotated training samples in computer vision has motivated the few-shot object detection technique, which exploits to transfer the source-domain detection model to the target domain. Under this paradigm, however, such transferred source-domain detection model usually encounters difficu...
The emerging event cameras have the potential to be an excellent complement for standard cameras within various visual tasks, especially in illumination‐changing environments or situations requiring high‐temporal resolution. Herein, an event‐based stereo visual odometry (VO) system via adaptive time‐surface (TS) and truncated signed distance functi...
Coverage path planning (CPP) is the foundation of multiple robotic applications. The efficiency of CPP is affected by the local extremum, which describes a situation with the robot surrounded by obstacles and explored areas, even if unexplored areas remain in the environment. Most online CPP methods reactively deal with the local extremum after the...
Deep neural networks (DNNs) are prone to the notorious catastrophic forgetting problem when learning new tasks incrementally. Class-incremental learning (CIL) is a promising solution to tackle the challenge and learn new classes while not forgetting old ones. Existing CIL approaches adopted stored representative exemplars or complex generative mode...
Coverage path planning (CPP) of multiple Dubins robots has been extensively applied in aerial monitoring, marine exploration, and search and rescue. Existing multi-robot coverage path planning (MCPP) research use exact or heuristic algorithms to address coverage applications. However, several exact algorithms always provide precise area division ra...
This paper proposes an air-ground collaborative unmanned system path planning framework based on a specific search and rescue environment. In recent years, UAVs have been widely used in a variety of scenarios due to their many advantages, including aiding path planning for UGVs on the ground. Most path planning assumes a digital map. However, in si...
The emerging event cameras are bio-inspired sensors that can output pixel-level brightness changes at extremely high rates, and event-based visual-inertial odometry (VIO) is widely studied and used in autonomous robots. In this paper, we propose an event-based stereo VIO system, namely ESVIO. Firstly, we present a novel direct event-based VIO metho...
Autonomous exploration is the essential task for various applications of unmanned aerial vehicles (UAVs), but there is currently a lack of available energy-constrained multi-UAV exploration methods. In this paper, we propose the RTN-Explorer, an environment exploration strategy that satisfies the energy constraints. The goal of environment explorat...
Self-supervised monocular depth estimation has attracted extensive attention in recent years. Lightweight depth estimation methods are crucial for resource-constrained edge devices. However, existing lightweight methods often encounter the challenge of limited representation capacity and increased computational resource consumption for image recons...
Segmentation of independently moving objects is an important stage in scene comprehension tasks like tracking and recognition. Frame-based cameras employed for dynamic scenes suffer from motion blur and exposure artifacts due to the sampling principle. In contrast, event-based cameras sample visual information based on scene dynamics and have the a...
Multi-scale methods are often used to improve the accuracy of detection models. However, this approach is usually computationally expensive. In this paper, we introduce an efficient Scale Divide and Conquer Network (ScaleDCNet) based on the keypoint object detection framework, which accomplishes independent detection at each scale with minimal cost...
Precise annotation of 6-D poses in real data is intricate and time-consuming, however, an essential requirement to train pose estimation pipelines. We propose a way for scalable, end-to-end 6-D pose regression with weak supervision to avoid this problem. Our method requires neither 3-D models nor 6-D object poses as ground truth. Instead, we use 2-...
Pose estimation of 3D objects in monocular images is a fundamental and long-standing problem in computer vision. Existing deep learning approaches for 6D pose estimation typically rely on the assumption of availability of 3D object models and 6D pose annotations. However, precise annotation of 6D poses in real data is intricate, time-consuming and...
Concrete crack detection is critical to the maintenance of infrastructure. Neural network-based vision methods are widely used to address this challenging task. Current supervised learning methods rely heavily on a large amount of labeled data. To tackle this problem, we proposed an unsupervised concrete crack detection method based on nnU-Net, in...
One-shot person re-identification (Re-ID) is a hot spot nowadays, where there is only one labeled image along with many unlabeled images for each identity. Due to the short of labeled training images, it’s hard to catch up with performance under full supervision. In this paper, we propose a progressive method with identity-based data augmentation t...
Deep reinforcement learning has made significant progress in multi-agent tasks in recent years. However, most previous studies focus on solving fully cooperative tasks, which do not perform well in mixed tasks. In mixed tasks, the agent needs to comprehensively consider the information provided by its friends and enemies to learn its strategy, and...
One-shot person Re-identification, which owns one labeled sample among numerous unlabeled data for each identity, is proposed to tackle the problem of the shortage of labeled data. Considering the scenarios without sufficient labeled data, it is very challenging to keep abreast of the performance of the supervised task in which sufficient labeled s...
In multi-agent environments, cooperation is crucially important, and the key is to understand the mutual interplay between agents. However, multi-agent environments are highly dynamic, where the complex relationships between agents cause great difficulty for policy learning, and it’s costly to take all coagents into consideration. Besides, agents m...
Initial position estimation in global maps, which is a prerequisite for accurate localization, plays a critical role in mobile robot navigation tasks. Global positioning system signals often become unreliable in disaster sites or indoor areas, which require other localization methods to help the robot in searching and rescuing. Many visual-based ap...
The performance of decentralized multi-agent systems tends to benefit from information sharing and its effective utilization. However, too much or unnecessary sharing may hinder the performance due to the delay, instability and additional overhead of communications. Aiming to a satisfiable coordination performance, one would prefer the cost of comm...
This paper presents a real-time monocular vision solution to human postures recognition for human-drone interaction. The approach achieves a more natural interaction between human and drone. Image regions and joint positions of human bodies in images from a monocular camera mounted on a micro drone are extracted by using a deep neural network. Then...
Three-dimensional (3D) point cloud understanding is important for autonomous robots. However, point clouds are normally irregular and discrete. It is challenging to obtain semantic information from them. In this paper, we present a method to build a dense semantic map, which utilizes both two-dimensional (2D) image labels and 3D geometric informati...
Three-dimensional (3D) point cloud understanding is important for autonomous robots. However, point clouds are normally irregular and discrete. It is challenging to obtain semantic information from them. In this paper, we present a method to build a dense semantic map, which utilizes both two-dimensional (2D) image labels and 3D geometric informati...
With the single-robot visual SLAM method reaching maturity, the issue of collaboratively exploring unknown environments by multiple robots attracts increasing attention. In this paper, we present CORB-SLAM, a novel collaborative multi-robot visual SLAM system providing map fusing and map sharing capabilities. Experimental results on popular public...
For efficient path planning of ground robots in three dimensional (3D) environments with structures like buildings or overhanging objects, an appropriate spatial representation of the environment is normally required. Some popular representations, like elevation maps and multi-level surface maps, need to be projected into a 2D plane to extract trav...
The simultaneous localization and mapping (SLAM) systems are widely used for self-localization of a robot, which is the basis of autonomous navigation. However, the state-of-art SLAM systems cannot suffice when navigating in large-scale environments due to memory limit and localization errors. In this paper, we propose a Geographic Information Syst...
Three-dimensional (3D) laser-based simultaneous localization and mapping (SLAM) can provide real-time pose information and construct accurate 3D map. However, detecting loop closures is a challenge in the 3D laser-based SLAM for expensive computation of algorithms. In this paper, we propose a visual method to detect and correct loop closures. We in...
Memory consumption of visual SLAM systems grows rapidly with their operation ranges increase. A well-designed organization scheme of map data is important for the scalability of visual SLAM in large-scale environments. In this paper, we present a novel visual SLAM system with an efficient memory management method to manage the map data using a spat...
For path planning of mobile robots in complex unknown three-dimensional (3D) environments, an accurate 3D volumetric representation of the environment is usually required. In this paper, we present a novel method to produce globally consistent 3D grid maps in real-time, through a grid-map update strategy and an efficient data structure. We transfor...
With the single-robot visual SLAM method reaching maturity, the issue of collaboratively exploring unknown environments by multiple robots attracts increasing attentions. In this paper, we present CORB-SLAM, a novel collaborative multi-robot visual SLAM system providing map fusing and map sharing capabilities. Experimental results on popular public...
Different sensors have different capacities and may even fail in different scenes. This will influence accuracy and reliability of SLAM systems utilizing those sensors. To enhance the robustness of SLAM systems, we present SceneSLAM in this paper, which is a novel extensible SLAM framework by combing different SLAM systems facilitated by a scene de...
Memory consumption of visual SLAM systems grows rapidly with their operation ranges increase. A well-designed organization scheme of map data is important for the scalability of visual SLAM in large-scale environments. In this paper, we present a novel visual SLAM system with an efficient memory management method to manage the map data using a spat...
In this paper, we present a solution to 3D mapping using a 2D laser scanner with pose estimates from an IMU-aided visual SLAM system. Accurate motion estimation of a robot is achieved by visual-inertial fusion based on an extended Kalman filter (EKF). Range measurements scanned on the vertical plane are received constantly by a 2D laser scanner mou...
Visual odometry is a core component of many visual navigation systems like visual simultaneous localization
and mapping (SLAM). Grid cells have been found as part of the path integration system in the rat’s entorhinal cortex, and they provide inputs for place cells in the rat’s hippocampus. Together with other cells, they constitute a positioning s...