Junjun Jiang

Junjun Jiang
  • PhD
  • Professor (Full) at Harbin Institute of Technology

About

337
Publications
107,015
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
18,609
Citations
Introduction
Junjun Jiang currently works at the School of Computer Science and Technology, Harbin Institute of Technology. His current project is image super-resolution and hyperspectral image classification. Please visit his homepage for more information, http://homepage.hit.edu.cn/jiangjunjun
Current institution
Harbin Institute of Technology
Current position
  • Professor (Full)
Additional affiliations
July 2018 - present
Harbin Institute of Technology
Position
  • Professor (Full)
October 2016 - present
National Institute of Informatics
Position
  • Project Researcher
January 2015 - June 2018
China University of Geosciences
Position
  • Professor (Associate)
Education
July 2009 - December 2014
Wuhan University
Field of study
  • Image Super-Resolution

Publications

Publications (337)
Article
Full-text available
Recently, position-patch based approaches have been proposed to replace the probabilistic graph-based or manifold learning-based models for face hallucination. In order to obtain the optimal weights of face hallucination, these approaches represent one image patch through other patches at the same position of training faces by employing least squar...
Article
Full-text available
Based on the assumption that Low-Resolution (LR) and High-Resolution (HR) manifolds are locally isometric, the neighbor embedding super-resolution algorithms try to preserve the geometry (reconstruction weights) of the LR space for the reconstructed HR space, but neglect the geometry of the original HR space. Due to the degradation process of the L...
Article
Full-text available
Face image super-resolution has attracted much attention in recent years. Many algorithms have been proposed. Among them, sparse representation (SR)-based face image super-resolution approaches are able to achieve competitive performance. However, these SR-based approaches only perform well under the condition that the input is noiseless or has sma...
Article
Full-text available
The performance of traditional face recognition systems is sharply reduced when encountered with a low-resolution (LR) probe face image. To obtain much more detailed facial features, some face super-resolution (SR) methods have been proposed in the past decade. The basic idea of a face image SR is to generate a high-resolution (HR) face image from...
Article
We propose Pixel2Pixel, a novel zero-shot image denoising framework that leverages the non-local self-similarity of images to generate a large number of training samples using only the input noisy image. This framework employs a compact convolutional neural network architecture to achieve high-quality image denoising. Given a single observed noisy...
Article
In this paper, we introduce MaeFuse, a novel autoencoder model designed for Infrared and Visible Image Fusion (IVIF). The existing approaches for image fusion often rely on training combined with downstream tasks to obtain high-level visual information, which is effective in emphasizing target objects and delivering impressive results in visual qua...
Preprint
Learned lossless image compression has achieved significant advancements in recent years. However, existing methods often rely on training amortized generative models on massive datasets, resulting in sub-optimal probability distribution estimation for specific testing images during encoding process. To address this challenge, we explore the connec...
Article
Recent advances in learning-based methods have markedly enhanced the capabilities of image compression. However, these methods struggle with high bit-depth volumetric medical images, facing issues such as degraded performance, increased memory demand, and reduced processing speed. To address these challenges, this paper presents the Bit-Division ba...
Preprint
Full-text available
Existing face super-resolution (FSR) methods have made significant advancements, but they primarily super-resolve face with limited visual information, original pixel-wise space in particular, commonly overlooking the pluralistic clues, like the higher-order depth and semantics, as well as non-visual inputs (text caption and description). Consequen...
Preprint
Full-text available
Learning a self-supervised Monocular Depth Estimation (MDE) model with great generalization remains significantly challenging. Despite the success of adversarial augmentation in the supervised learning generalization, naively incorporating it into self-supervised MDE models potentially causes over-regularization, suffering from severe performance d...
Article
Degradation under challenging conditions such as rain, haze, and low light not only diminishes content visibility, but also results in additional degradation side effects, including detail occlusion and color distortion. However, current technologies have barely explored the correlation between perturbation removal and background restoration, conse...
Article
The spatial and spectral features of hyperspectral images exhibit complementarity, and neglecting them prevents the full exploitation of useful information for superresolution. This article proposes a spatial-spectral middle cross-attention fusion network to explore the spatial-spectral structure correlation. Initially, we learn spatial and spectra...
Preprint
Recent advances in learning-based methods have markedly enhanced the capabilities of image compression. However, these methods struggle with high bit-depth volumetric medical images, facing issues such as degraded performance, increased memory demand, and reduced processing speed. To address these challenges, this paper presents the Bit-Division ba...
Preprint
Full-text available
Image fusion is famous as an alternative solution to generate one high-quality image from multiple images in addition to image restoration from a single degraded image. The essence of image fusion is to integrate complementary information from source images. Existing fusion methods struggle with generalization across various tasks and often require...
Preprint
Radiance fields represented by 3D Gaussians excel at synthesizing novel views, offering both high training efficiency and fast rendering. However, with sparse input views, the lack of multi-view consistency constraints results in poorly initialized point clouds and unreliable heuristics for optimization and densification, leading to suboptimal perf...
Article
Transformer-based entropy models have gained prominence in recent years due to their superior ability to capture long-range dependencies in probability distribution estimation compared to convolution-based methods. However, previous transformer-based entropy models suffer from sluggish coding process due to pixel-wise autoregression or duplicated c...
Preprint
Efficient storage of large-scale point cloud data has become increasingly challenging due to advancements in scanning technology. Recent deep learning techniques have revolutionized this field; However, most existing approaches rely on single-modality contexts, such as octree nodes or voxel occupancy, limiting their ability to capture information a...
Article
In recent years, many single hyperspectral image super-resolution methods have emerged to enhance the spatial resolution of hyperspectral images without hardware modification. However, existing methods typically face two significant challenges. First, they struggle to handle the high-dimensional nature of hyperspectral data, which often results in...
Article
Deep convolutional neural networks, particularly large models with large kernels (3 × 3 or more), have achieved significant progress in single image super-resolution (SISR) tasks. However, the heavy computational footprint of such models prevents their deployment in real-time, resource-constrained environments. Conversely, 1 × 1 convolutions have s...
Article
The recent emergence of deep learning neural networks has propelled advancements in the field of face super-resolution. While these deep learning-based methods have shown significant performance improvements, they depend overwhelmingly on fixed, spatially shared kernels within standard convolutional layers. This leads to a neglect of the diverse fa...
Article
Multi-focus image fusion can fuse the clear parts of two or more source images captured at the same scene with different focal lengths into an all-in-focus image. On the one hand, previous supervised learning-based multi-focus image fusion methods relying on synthetic datasets have a clear distribution shift with real scenarios. On the other hand,...
Preprint
Neural Radiance Fields (NeRF) with hybrid representations have shown impressive capabilities in reconstructing scenes for view synthesis, delivering high efficiency. Nonetheless, their performance significantly drops with sparse view inputs, due to the issue of overfitting. While various regularization strategies have been devised to address these...
Article
Denoising diffusion probabilistic models (DDPM) have shown impressive performance in various domains as a class of deep generative models. In this paper, we introduce the Mixture noise-based DDPM (Mix-DDPM), which considers the Markov diffusion posterior as a Gaussian mixture model. Specifically, Mix-DDPM randomly selects a Gaussian component and t...
Article
Images captured under low-light conditions suffer from several combined degradation factors, including low brightness, low contrast, noise, and color bias. Many learning-based techniques attempt to learn the low-to-clear mapping between low-light and normal-light images. However, they often fall short when applied to low-light images taken in wide-...
Article
Spectral compressive imaging has emerged as a powerful technique to collect the 3D spectral information as 2D measurements. The algorithm for restoring the original 3D hyperspectral images (HSIs) from compressive measurements is pivotal in the imaging process. Early approaches painstakingly designed networks to directly map compressive measurements...
Article
Human face captured at night or in dimly lit environments has become a common practice, accompanied by complex low-light and low-resolution degradations. However, the existing face super-resolution (FSR) technologies and derived cascaded schemes are inadequate to recover credible textures. In this paper, we propose a novel approach that decomposes...
Article
The wavelet transform has emerged as a powerful tool in deciphering structural information within images. And now, the latest research suggests that combining the prowess of wavelet transform with neural networks can lead to unparalleled image deraining results. By harnessing the strengths of both the spatial domain and frequency space, this innova...
Article
Contrastive learning has emerged as a prevailing paradigm for high-level vision tasks, which, by introducing properly negative samples, has also been exploited for low-level vision tasks to achieve a compact optimization space to account for their ill-posed nature. However, existing methods rely on manually predefined and task-oriented negatives, w...
Article
Image deblurring continues to achieve impressive performance with the development of generative models. Nonetheless, there still remains a displeasing problem if one wants to improve perceptual quality and quantitative scores of recovered image at the same time. In this study, drawing inspiration from the research of transformer properties, we intr...
Article
Federated learning (FL) is a promising decentralized machine learning approach that enables multiple distributed clients to train a model jointly while keeping their data private. However, in real-world scenarios, the supervised training data stored in local clients inevitably suffer from imperfect annotations, resulting in subjective, inconsistent...
Article
When considering the temporal relationships, most previous video super-resolution (VSR) methods follow the iterative or recurrent framework. The iterative framework adopts neighboring low-resolution (LR) frames from a sliding window, while the recurrent framework utilizes the output generated in the previous SR procedure. The hybrid framework combi...
Article
Full-text available
Most of the existing remote sensing image Super-Resolution (SR) methods based on deep learning tend to learn the mapping from low-resolution (LR) images to High-Resolution (HR) images directly. But they ignore the potential structure and texture consistency of LR and HR spaces, which cause the loss of high-frequency information and produce artifact...
Article
Deep learning has emerged as a prominent technique in the field of holographic imaging, owing to its rapidity and high performance. Prevailing deep neural networks employed for holographic image reconstruction predominantly rely on convolutional neural networks (CNNs). While CNNs have yielded impressive results, their intrinsic limitations, charact...
Article
Image explicit prior has made breakthrough progress in the super-resolution due to the additional supervisory information provided. However, existing explicit prior-guided super-resolution methods directly use the Gaussian gradient or Laplacian gradient prior, which can not fit the gradient distribution of remote sensing images. Through the statist...
Article
Face super-resolution is a technology that transforms a low-resolution face image into the corresponding high-resolution one. In this paper, we build a novel parsing map guided face super-resolution network which extracts the face prior (i.e., parsing map) directly from low-resolution face image for the following utilization. To exploit the extract...
Preprint
The wavelet transform has emerged as a powerful tool in deciphering structural information within images. And now, the latest research suggests that combining the prowess of wavelet transform with neural networks can lead to unparalleled image deraining results. By harnessing the strengths of both the spatial domain and frequency space, this innova...
Article
Full-text available
This paper aims to address the problem of supervised monocular depth estimation. We start with a meticulous pilot study to demonstrate that the long-range correlation is essential for accurate depth estimation. Moreover, the Transformer and convolution are good at long-range and close-range depth estimation, respectively. Therefore, we propose to a...
Conference Paper
Full-text available
Single image deraining aims to remove rain perturbation while restoring the clean background scene from a rain image. However, existing methods tend to produce blurry and over-smooth outputs, lacking some textural details. Wavelet transform can depict the contextual and textural information of an image at different levels, showing impressive capabi...
Article
Exemplar-based colorization is a challenging task, which attempts to add colors to the target grayscale image with the aid of a reference color image, so as to keep the target semantic content while with the reference color style. In order to achieve visually plausible chromatic results, it is important to sufficiently exploit the global color styl...
Article
Point clouds upsampling (PCU), which aims to generate dense and uniform point clouds from the captured sparse input of 3D sensor such as LiDAR, is a practical yet challenging task. It has potential applications in many real-world scenarios, such as autonomous driving, robotics, AR/VR, etc. Deep neural network based methods achieve remarkable succes...
Preprint
Full-text available
Large amounts of incremental learning algorithms have been proposed to alleviate the catastrophic forgetting issue arises while dealing with sequential data on a time series. However, the adversarial robustness of incremental learners has not been widely verified, leaving potential security risks. Specifically, for poisoning-based backdoor attacks,...
Article
Full-text available
Recently, CNN-based methods for hyperspectral image super-resolution (HSISR) have achieved outstanding performance. Due to the multi-band property of hyperspectral images, 3D convolutions are natural candidates for extracting spatial–spectral correlations. However, pure 3D CNN models are rare to see, since they are generally considered to be too co...
Preprint
Face super-resolution is a technology that transforms a low-resolution face image into the corresponding high-resolution one. In this paper, we build a novel parsing map guided face super-resolution network which extracts the face prior (i.e., parsing map) directly from low-resolution face image for the following utilization. To exploit the extract...
Article
Guided filter is a fundamental tool in computer vision and computer graphics, which aims to transfer structure information from the guide image to the target image. Most existing methods construct filter kernels from the guidance itself without considering the mutual dependency between the guidance and the target. However, since there typically exi...
Preprint
Full-text available
Image deblurring continues to achieve impressive performance with the development of generative models. Nonetheless, there still remains a displeasing problem if one wants to improve perceptual quality and quantitative scores of recovered image at the same time. In this study, drawing inspiration from the research of transformer properties, we intr...
Preprint
Full-text available
In recent years, the use of large convolutional kernels has become popular in designing convolutional neural networks due to their ability to capture long-range dependencies and provide large receptive fields. However, the increase in kernel size also leads to a quadratic growth in the number of parameters, resulting in heavy computation and memory...
Preprint
Full-text available
In this paper, we improve the challenging monocular 3D object detection problem with a general semi-supervised framework. Specifically, having observed that the bottleneck of this task lies in lacking reliable and informative samples to train the detector, we introduce a novel, simple, yet effective `Augment and Criticize' framework that explores a...
Preprint
Guided depth map super-resolution (GDSR), which aims to reconstruct a high-resolution (HR) depth map from a low-resolution (LR) observation with the help of a paired HR color image, is a longstanding and fundamental problem, it has attracted considerable attention from computer vision and image processing communities. A myriad of novel and effectiv...
Chapter
Full-text available
Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to...
Article
Guided depth map super-resolution (GDSR), which aims to reconstruct a high-resolution (HR) depth map from a low-resolution (LR) observation with the help of a paired HR color image, is a longstanding and fundamental problem, it has attracted considerable attention from computer vision and image processing communities. A myriad of novel and effectiv...
Chapter
Monocular depth estimation is an essential task in the computer vision community. While tremendous successful methods have obtained excellent results, most of them are computationally expensive and not applicable for real-time on-device inference. In this paper, we aim to address more practical applications of monocular depth estimation, where the...
Article
Supervised deep learning has achieved tremendous success in many computer vision tasks, which however is prone to overfit noisy labels. To mitigate the undesirable influence of noisy labels, robust loss functions offer a feasible approach to achieve noise-tolerant learning. In this work, we systematically study the problem of noise-tolerant learnin...
Article
Recently, super-resolution (SR) tasks for single hyperspectral images have been extensively investigated and significant progress has been made by introducing advanced deep learning-based methods. However, hyperspectral image SR is still a challenging problem because of the numerous narrow and successive spectral bands of hyperspectral images. Exis...
Article
Hyperspectral images (HSIs) have been widely used in earth observation because they contain continuous and detailed spectral information which is beneficial for the fine-grained diagnosis of the land cover. In the past few years, convolutional neural networks (CNNs) based methods show limitations in modeling spectral-wise long-range dependencies. R...
Article
Dimensionality reduction (DR) is important for feature extraction and classification of hyperspectral images (HSIs). Recently proposed superpixel-based DR models have shown promising performance, where superpixel segmentation techniques were applied to segment an HSI and then DR models like principal component analysis (PCA) or linear discriminant...
Article
Data augmentation (DA) is a widely used technique for enhancing the training of deep neural networks. Recent DA techniques which achieve state-of-the-art performance always meet the need for diversity in augmented training samples. However, an augmentation strategy that has a high diversity usually introduces out-of-distribution (OOD) augmented sam...
Preprint
Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to...
Chapter
Monocular 3D object detection (Mono3D) has achieved unprecedented success with the advent of deep learning techniques and emerging large-scale autonomous driving datasets. However, drastic performance degradation remains an unwell-studied challenge for practical cross-domain deployment as the lack of labels on the target domain. In this paper, we f...
Chapter
Image fusion is famous as an alternative solution to generate one high-quality image from multiple images in addition to image restoration from a single degraded image. The essence of image fusion is to integrate complementary information or best parts from source images. The current fusion methods usually need a large number of paired samples or s...
Article
Existing face hallucination methods always achieve improved performance through regularizing the model with facial prior. Most of them always estimate facial prior information first and then leverage it to help the prediction of the target high-resolution face image. However, the accuracy of prior estimation is difficult to guarantee, especially fo...
Article
Multi-focus image fusion (MFF) is an effective way to eliminate the out-of-focus blur generated in the imaging process. The difficulties in distinguishing different blur levels and the lack of real supervised data make multi-focus image fusion remain a challenging task after decades of research. According to deep image prior (DIP) (Ulyanov et al.,...
Preprint
Depth map estimation from images is an important task in robotic systems. Existing methods can be categorized into two groups including multi-view stereo and monocular depth estimation. The former requires cameras to have large overlapping areas and sufficient baseline between cameras, while the latter that processes each image independently can ha...
Article
Full-text available
Hyperspectral images (HSIs) contain spatially structured information and pixel-level sequential spectral attributes. The continuous spectral features contain hundreds of wavelength bands and the differences between spectra are essential for achieving fine-grained classification. Due to the limited receptive field of backbone networks, convolutional...
Article
Recently, facial priors (e.g., facial parsing maps and facial landmarks) have been widely employed in prior-guided face super-resolution (FSR) because it provides the location of facial components and facial structure information, and helps predict the missing high-frequency (HF) information. However, most existing approaches suffer from two shortc...
Preprint
Full-text available
Monocular depth estimation is an essential task in the computer vision community. While tremendous successful methods have obtained excellent results, most of them are computationally expensive and not applicable for real-time on-device inference. In this paper, we aim to address more practical applications of monocular depth estimation, where the...
Article
Dear Editor, This letter is concerned with self-supervised monocular depth estimation. To estimate uncertainty simultaneously, we propose a simple yet effective strategy to learn the uncertainty for self-supervised monocular depth estimation with the discrete strategy that explicitly associates the prediction and the uncertainty to train the networ...
Conference Paper
Rain streaks and background components in a rainy input are highly correlated, making the deraining task a composition of the rain streak removal and background restoration. However, the correlation of these two components is barely considered, leading to unsatisfied deraining results. To this end, we propose a dynamic associated network (DANet) to...
Article
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application. Instead of placing an existing Transformer-based image classification model directly after an image codec, we aim to redesign the Vision Transformer (ViT) model to perform image classification from the comp...
Article
3D meshes are widely employed to represent geometry structure of 3D shapes. Due to limitation of scanning sensor precision and other issues, meshes are inevitably affected by noise, which hampers the subsequent applications. Convolultional neural networks (CNNs) achieve great success in image processing tasks, including 2D image denoising, and have...
Article
Pre-training has become a standard paradigm in many computer vision tasks. However, most of the methods are generally designed on the RGB image domain. Due to the discrepancy between the two-dimensional image plane and the three-dimensional space, such pre-trained models fail to perceive spatial information and serve as sub-optimal solutions for 3D...
Preprint
One of the main challenges for feature representation in deep learning-based classification is the design of appropriate loss functions that exhibit strong discriminative power. The classical softmax loss does not explicitly encourage discriminative learning of features. A popular direction of research is to incorporate margins in well-established...
Preprint
The success of deep neural networks greatly relies on the availability of large amounts of high-quality annotated data, which however are difficult or expensive to obtain. The resulting labels may be class imbalanced, noisy or human biased. It is challenging to learn unbiased classification models from imperfectly annotated datasets, on which we us...
Preprint
Single-cell RNA sequencing (scRNA-seq) reveals the heterogeneity and diversity among individual cells and allows researchers conduct cell-wise analysis. Clustering analysis is a fundamental step in analyzing scRNA-seq data which is needed in many downstream tasks. Recently, some deep clustering based methods exhibit very good performance by combini...
Conference Paper
Full-text available
This paper presents results from the third Thermal Image Super-Resolution (TISR) challenge organized in the Perception Beyond the Visible Spectrum (PBVS) 2022 workshop. The challenge uses the same thermal image dataset as the first two challenges, with 951 training images and 50 validation images at each resolution. A set of 20 images was kept asid...
Article
Person re-identification (ReID) aims at searching the same identity person among images captured by various cameras. Existing fully supervised person ReID methods usually suffer from poor generalization capability caused by domain gaps. Unsupervised person ReID has attracted a lot of attention recently, because it works without intensive manual ann...

Network

Cited By