Haibin Ling

Haibin Ling
Temple University | TU · Department of Computer and Information Science

About

203
Publications
60,159
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,765
Citations
Citations since 2016
110 Research Items
7975 Citations
201620172018201920202021202202004006008001,0001,2001,400
201620172018201920202021202202004006008001,0001,2001,400
201620172018201920202021202202004006008001,0001,2001,400
201620172018201920202021202202004006008001,0001,2001,400

Publications

Publications (203)
Article
Modern top-performing object detectors depend heavily on backbone networks, whose advances bring consistent performance gains through exploring more effective network structures. In this paper, we propose a novel and flexible backbone framework, namely CBNet, to construct high-performance detectors using existing open-source pre-trained backbones u...
Article
This study proposes a unified gradient- and intensity-discriminator generative adversarial network for various image fusion tasks, including infrared and visible image fusion, medical image fusion, multi-focus image fusion, and multi-exposure image fusion. On the one hand, we unify all fusion tasks into discriminating a fused image’s gradient and i...
Preprint
Full-text available
In a data-driven paradigm, machine learning (ML) is the central component for developing accurate and universal exchange-correlation (XC) functionals in density functional theory (DFT). It is well known that XC functionals must satisfy several exact conditions and physical constraints, such as density scaling, spin scaling, and derivative discontin...
Preprint
Convolutional neural networks have made significant progresses in edge detection by progressively exploring the context and semantic features. However, local details are gradually suppressed with the enlarging of receptive fields. Recently, vision transformer has shown excellent capability in capturing long-range dependencies. Inspired by this, we...
Article
Full-text available
Development of next-generation electronic devices calls for the discovery of quantum materials hosting novel electronic, magnetic, and topological properties. Traditional electronic structure methods require expensive computation time and memory consumption, thus a fast and accurate prediction model is desired with increasing importance. Representi...
Article
Cryo-electron microscopy (cryo-EM) is a Nobel Prize-winning technique for determining high-resolution 3D structures of biological macromolecules. A 3D structure is reconstructed from hundreds of thousands of noisy 2D projection images. However, existing 3D reconstruction methods are still time-consuming, and one of the major computational bottlenec...
Article
In this paper, we present ARCHIE++, a testing framework for conducting AR system testing and collecting user feedback in the wild. We begin by presenting a set of current trends in performing human testing of AR systems, identified by reviewing a selection of recent work from leading conferences in mixed reality, human factors, and mobile and perva...
Preprint
Transformer has recently demonstrated clear potential in improving visual tracking algorithms. Nevertheless, existing transformer-based trackers mostly use Transformer to fuse and enhance the features generated by convolutional neural networks (CNNs). By contrast, in this paper, we propose a fully attentional-based Transformer tracking algorithm, S...
Conference Paper
Full-text available
In this work, we consider the problem of self-supervised Moving Object Detection (MOD) in video, where no ground truth is involved in both training and inference phases. Recently , an adversarial learning framework is proposed [32] to leverage inherent temporal information for MOD. While showing great promising results, it uses single scale tempora...
Article
Image fusion synthesizes a new image from multiple images of the same scene. The synthesized image should be suitable for human visual perception and follow-up high-level image-processing tasks. However, existing methods focus on fusing low-level features, ignoring high-level semantic perception information. We propose a new end-to-end model to obt...
Article
Full-text available
Drones, or general UAVs, equipped with cameras have been fast deployed with a wide range of applications, including agriculture, aerial photography, and surveillance. Consequently, automatic understanding of visual data collected from drones becomes highly demanding, bringing computer vision and drones more and more closely. To promote and track th...
Preprint
Many critical applications rely on cameras to capture video footage for analytical purposes. This has led to concerns about these cameras accidentally capturing more information than is necessary. In this paper, we propose a deep learning approach towards protecting privacy in camera-based systems. Instead of specifying specific objects (e.g. faces...
Preprint
Full-text available
While deep neural networks have shown impressive performance in many tasks, they are fragile to carefully designed adversarial attacks. We propose a novel adversarial training-based model by Attention Guided Knowledge Distillation and Bi-directional Metric Learning (AGKD-BML). The attention knowledge is obtained from a weight-fixed model trained on...
Preprint
As a fundamental building block in computer vision, edges can be categorised into four types according to the discontinuity in surface-Reflectance, Illumination, surface-Normal or Depth. While great progress has been made in detecting generic or individual types of edges, it remains under-explored to comprehensively study all four edge types togeth...
Preprint
Full-text available
Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint. However, there are few algorithms focusing on crowd counting on the drone-captured data due to the lack of comprehensive datasets. To this end, we collect a large-scale d...
Article
Full-text available
In an augmented reality (AR) application, placing labels in a manner that is clear and readable without occluding the critical information from the real world can be a challenging problem. This paper introduces a label placement technique for AR used in street view scenarios. We propose a semantic-aware task-specific label placement method by ident...
Article
Despite that efforts have shifted to learning local descriptors with convolutional neural network (CNN) from hand-crafted realm, the inherent feature hierarchy within CNN has been rarely explored. To increase both the invariant and discriminative abilities of the CNN-based local descriptors by making use of the complementary representation powers o...
Preprint
Full-text available
Camera pose estimation or camera relocalization is the centerpiece in numerous computer vision tasks such as visual odometry, structure from motion (SfM) and SLAM. In this paper we propose a neural network approach with a graph transformer backbone, namely TransCamP, to address the camera relocalization problem. In contrast with prior work where th...
Article
Mail privacy protection aims to prevent unauthorized access to hidden content within an envelope since normal paper envelopes are not as safe as we think. In this paper, for the first time, we show that with a well designed deep learning model, the hidden content may be largely recovered without opening the envelope. We start by modeling deep learn...
Article
Planar object tracking is an important problem in vision-based robotic systems. Several benchmarks have been constructed to evaluate the tracking algorithms. However, these benchmarks are built in constrained laboratory environments and there is a lack of video sequences captured in the wild to investigate the effectiveness of trackers in practical...
Preprint
Full-text available
Tracking multiple objects in videos relies on modeling the spatial-temporal interactions of the objects. In this paper, we propose a solution named Spatial-Temporal Graph Transformer (STGT), which leverages powerful graph transformers to efficiently model the spatial and temporal interactions among the objects. STGT effectively models the interacti...
Article
Image-based relighting, projector compensation and depth/normal reconstruction are three important tasks of projector-camera systems (ProCams) and spatial augmented reality (SAR). Although they share a similar pipeline of finding projector-camera image mappings, in tradition, they are addressed independently, sometimes with different prerequisites,...
Preprint
Full-text available
Rotation averaging is a synchronization process on single or multiple rotation groups, and is a fundamental problem in many computer vision tasks such as multi-view structure from motion (SfM). Specifically, rotation averaging involves the recovery of an underlying pose-graph consistency from pairwise relative camera poses. Specifically, given pair...
Article
Full-text available
Despite great recent advances in visual tracking, its further development, including both algorithm design and evaluation, is limited due to lack of dedicated large-scale benchmarks. To address this problem, we present LaSOT, a high-quality Large-scale Single Object Tracking benchmark. LaSOT contains a diverse selection of 85 object classes, and of...
Article
Full-text available
Full projector compensation aims to modify a projector input image to compensate for both geometric and photometric disturbance of the projection surface. Traditional methods usually solve the two parts separately and may suffer from suboptimal solutions. In this paper, we propose the first end-to-end differentiable solution, named CompenNeSt++, to...
Preprint
Mail privacy protection aims to prevent unauthorized access to hidden content within an envelope since normal paper envelopes are not as safe as we think. In this paper, for the first time, we show that with a well designed deep learning model, the hidden content may be largely recovered without opening the envelope. We start by modeling deep learn...
Preprint
Light-based adversarial attacks aim to fool deep learning-based image classifiers by altering the physical light condition using a controllable light source, e.g., a projector. Compared with physical attacks that place carefully designed stickers or printed adversarial objects, projector-based ones obviate modifying the physical entities. Moreover,...
Preprint
High quality object proposals are crucial in visual tracking algorithms that utilize region proposal network (RPN). Refinement of these proposals, typically by box regression and classification in parallel, has been popularly adopted to boost tracking performance. However, it still meets problems when dealing with complex and dynamic background. Th...
Preprint
Multiple Object Tracking (MOT) has witnessed remarkable advances in recent years. However, existing studies dominantly request prior knowledge of the tracking target, and hence may not generalize well to unseen categories. In contrast, Generic Multiple Object Tracking (GMOT), which requires little prior information about the target, is largely unde...
Preprint
Full-text available
Visual tracking has achieved considerable progress in recent years. However, current research in the field mainly focuses on tracking of opaque objects, while little attention is paid to transparent object tracking. In this paper, we make the first attempt in exploring this problem by proposing a Transparent Object Tracking Benchmark (TOTB). Specif...
Preprint
As an essential part of structure from motion (SfM) and Simultaneous Localization and Mapping (SLAM) systems, motion averaging has been extensively studied in the past years and continues to attract surging research attention. While canonical approaches such as bundle adjustment are predominantly inherited in most of state-of-the-art SLAM systems t...
Chapter
Real-world data often follow a long-tailed distribution as the frequency of each class is typically different. For example, a dataset can have a large number of under-represented classes and a few classes with more than sufficient data. However, a model to represent the dataset is usually expected to have reasonably homogeneous performances across...
Preprint
Full-text available
Despite great recent advances in visual tracking, its further development, including both algorithm design and evaluation, is limited due to lack of dedicated large-scale benchmarks. To address this problem, we present LaSOT, a high-quality Large-scale Single Object Tracking benchmark. LaSOT contains a diverse selection of 85 object classes, and of...
Article
The goal of our work is to discover dominant objects in a very general setting where only a single unlabeled image is given. This is far more challenge than typical colocalization or weakly-supervised localization tasks. To tackle this problem, we propose a simple but effective pattern mining-based method, called Object Location Mining (OLM), which...
Preprint
Real-world data often follow a long-tailed distribution as the frequency of each class is typically different. For example, a dataset can have a large number of under-represented classes and a few classes with more than sufficient data. However, a model to represent the dataset is usually expected to have reasonably homogeneous performances across...
Preprint
Full projector compensation aims to modify a projector input image to compensate for both geometric and photometric disturbance of the projection surface. Traditional methods usually solve the two parts separately and may suffer from suboptimal solutions. In this paper, we propose the first end-to-end differentiable solution, named CompenNeSt++, to...
Article
Full-text available
This study proposes a novel unified and unsupervised end-to-end image fusion network, termed as U2Fusion, which is capable of solving different fusion problems, including multi-modal, multi-exposure, and multi-focus cases. Using feature extraction and information measurement, U2Fusion automatically estimates the importance of corresponding source i...
Article
Existing projector-camera calibration methods typically warp keypoints from a camera image to a projector image using estimated homographies and often suffer from errors in camera parameters and noises due to imperfect planarity of the calibration target. This article proposes a practical and robust projector-camera calibration system that explicit...
Preprint
Full-text available
Development of next-generation electronic devices for applications call for the discovery of quantum materials hosting novel electronic, magnetic, and topological properties. Traditional electronic structure methods require expensive computation time and memory consumption, thus a fast and accurate prediction model is desired with increasing import...
Preprint
Full-text available
Existing CNN-based methods for pixel labeling heavily depend on multi-scale features to meet the requirements of both semantic comprehension and detail preservation. State-of-the-art pixel labeling neural networks widely exploit conventional scale-transfer operations, i.e., up-sampling and down-sampling to learn multi-scale features. In this work,...
Article
Image alignment/registration/correspondence is a critical prerequisite for many vision-based tasks, and it has been widely studied in computer vision. However, aligning images from different domains, such as cross-weather/season road scenes, remains a challenging problem. Inspired by the success of classic intensity-constancy-based image alignment...
Preprint
In projector-camera systems, light transport models the propagation from projector emitted radiance to camera-captured irradiance. In this paper, we propose the first end-to-end trainable solution named Deep Light Transport (DeLTra) that estimates radiometrically uncalibrated projector-camera light transport. DeLTra is designed to have two modules:...
Article
Deep networks have been used for semantic segmentation tasks on scenes of outdoor environments with increasing popularity. However, the majority of existing work centers on daytime scenes with favorable illumination and weather conditions, and relies on supervision with pixel-level annotations. This paper seeks to address the problem of semantic se...
Article
Full-text available
The multi-dimensional assignment problem is universal for data association analysis such as data association-based visual multi-object tracking and multi-graph matching. In this paper, multi-dimensional assignment is formulated as a rank-1 tensor approximation problem. A dual L1-normalized context/hyper-context aware tensor power iteration optimiza...
Chapter
The Vision Meets Drone (VisDrone2020) Single Object Tracking is the third annual UAV tracking evaluation activity organized by the VisDrone team, in conjunction with European Conference on Computer Vision (ECCV 2020). The VisDrone-SOT2020 Challenge presents and discusses the results of 13 participating algorithms in detail. By using ensemble of dif...
Chapter
Full-text available
The Visual Object Tracking challenge VOT2020 is the eighth annual tracker benchmarking activity organized by the VOT initiative. Results of 58 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The VOT2020 challenge was composed of five sub-challenges focusin...
Preprint
In an augmented reality (AR) application, placing labels in a manner that is clear and readable without occluding the critical information from the real-world can be a challenging problem. This paper introduces a label placement technique for AR used in street view scenarios. We propose a semantic-aware task-specific label placement method by ident...
Preprint
Full-text available
Feature pyramid architecture has been broadly adopted in object detection and segmentation to deal with multi-scale problem. However, in this paper we show that the capacity of the architecture has not been fully explored due to the inadequate utilization of the supervision information. Such insufficient utilization is caused by the supervision sig...
Preprint
Full-text available
Generic visual tracking is difficult due to many challenge factors (e.g., occlusion, blur, etc.). Each of these factors may cause serious problems for a tracking algorithm, and when they work together can make things even more complicated. Despite a great amount of efforts devoted to understanding the behavior of tracking algorithms, reliable and q...
Data
supplementary of Clustered Object Detection in Aerial Image
Preprint
Full projector compensation aims to modify a projector input image such that it can compensate for both geometric and photometric disturbance of the projection surface. Traditional methods usually solve the two parts separately, although they are known to correlate with each other. In this paper, we propose the first end-to-end solution, named Comp...
Article
Full-text available
High-order motion information is important in multi-target tracking (MTT) especially when dealing with large inter-target ambiguities. Such high-order information can be naturally modeled as a multi-dimensional assignment (MDA) problem, whose global solution is however intractable in general. In this paper, we propose a novel framework to the probl...
Article
Full-text available
A hardware-in-loop control framework with robot dynamic models, pursuit–evasion game models, sensor and information solutions, and entity tracking algorithms is designed and developed to demonstrate discrete-time robotic pursuit–evasion games for real-world conditions. A parameter estimator is implemented to learn the unknown parameters in the robo...
Article
Full-text available
Pavement crack detection is a critical task for insuring road safety. Manual crack detection is extremely time-consuming. Therefore, an automatic road crack detection method is required to boost this progress. However, it remains a challenging task due to the intensity inhomogeneity of cracks and complexity of the background, e.g., the low contrast...
Preprint
Full-text available
Detecting objects in aerial images is challenging for at least two reasons: (1) target objects like pedestrians are very small in pixels, making them hardly distinguished from surrounding background; and (2) targets are in general sparsely and non-uniformly distributed, making the detection very inefficient. In this paper, we address both issues in...
Preprint
Data association-based multiple object tracking (MOT) involves multiple separated modules processed or optimized differently, which results in complex method design and requires non-trivial tuning of parameters. In this paper, we present an end-to-end model, named FAMNet, where Feature extraction, Affinity estimation and Multi-dimensional assignmen...
Preprint
Full-text available
Projector photometric compensation aims to modify a projector input image such that it can compensate for disturbance from the appearance of projection surface. In this paper, for the first time, we formulate the compensation problem as an end-to-end learning problem and propose a convolutional neural network, named CompenNet, to implicitly learn t...
Preprint
Recent progresses in model-free single object tracking (SOT) algorithms have largely inspired applying SOT to \emph{multi-object tracking} (MOT) to improve the robustness as well as relieving dependency on external detector. However, SOT algorithms are generally designed for distinguishing a target from its environment, and hence meet problems when...
Preprint
Full-text available
Pavement crack detection is a critical task for insuring road safety. Manual crack detection is extremely time-consuming. Therefore, an automatic road crack detection method is required to boost this progress. However, it remains a challenging task due to the intensity inhomogeneity of cracks and complexity of the background, e.g., the low contrast...