Jianmin Jiang’s research while affiliated with Shenzhen University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (97)


Interpretable Optimization-Inspired Unfolding Network for Low-Light Image Enhancement
  • Article

January 2025

·

1 Read

IEEE Transactions on Pattern Analysis and Machine Intelligence

Wenhui Wu

·

Jian Weng

·

·

[...]

·

Jianmin Jiang

Retinex model-based methods have shown to be effective in layer-wise manipulation with well-designed priors for low-light image enhancement (LLIE). However, the hand-crafted priors and conventional optimization algorithm adopted to solve the layer decomposition problem result in the lack of adaptivity and efficiency. To this end, this paper proposes a Retinex-based deep unfolding network (URetinex-Net++), which unfolds an optimization problem into a learnable network to decompose a low-light image into reflectance and illumination layers. By formulating the decomposition problem as an implicit priors regularized model, three learning-based modules are carefully designed, responsible for data-dependent initialization, high-efficient unfolding optimization, and fairly-flexible component adjustment, respectively. Particularly, the proposed unfolding optimization module, introducing two networks to adaptively fit implicit priors in the data-driven manner, can realize noise suppression and details preservation for decomposed components. URetinex-Net++ is a further augmented version of URetinex-Net, which introduces a cross-stage fusion block to alleviate the color defect in URetinex-Net. Therefore, boosted performance on LLIE can be obtained in both visual quality and quantitative metrics, where only a few parameters are introduced and little time is cost. Extensive experiments on real-world low-light images qualitatively and quantitatively demonstrate the effectiveness and superiority of the proposed URetinex-Net++ over state-of-the-art methods.


Geometry-Aware Self-Supervised Indoor 360° Depth Estimation via Asymmetric Dual-Domain Collaborative Learning

January 2025

·

1 Read

IEEE Transactions on Multimedia

Being able to estimate monocular depth for spherical panoramas is of fundamental importance in 3D scene perception. However, spherical distortion severely limits the effectiveness of vanilla convolutions. To push the envelope of accuracy, recent approaches attempt to utilize Tangent projection (TP) to estimate the depth of 360° images. Yet, these methods still suffer from discrepancies and inconsistencies among patch-wise tangent images, as well as the lack of accurate ground truth depth maps under a supervised fashion. In this paper, we propose a geometry-aware self-supervised 360° image depth estimation methodology that explores the complementary advantages of TP and Equirectangular projection (ERP) by an asymmetric dual-domain collaborative learning strategy. Especially, we first develop a lightweight asymmetric dual-domain depth estimation network, which enables to aggregate depth-related features from a single TP domain, and then produce depth distributions of the TP and ERP domains via collaborative learning. This effectively mitigates stitching artifacts and preserves fine details in depth inference without overspending model parameters. In addition, a frequent-spatial feature concentration module is devised to simultaneously capture non-local Fourier features and local spatial features, such that facilitating the efficient exploration of monocular depth cues. Moreover, we introduce a geometric structural alignment module to further improve geometric structural consistency among tangent images. Extensive experiments illustrate that our designed approach outperforms existing self-supervised 360° depth estimation methods on three publicly available benchmark datasets.



Dual-Clustered Conditioning Toward GAN-Based Diverse Image Generation

February 2024

·

3 Reads

·

2 Citations

IEEE Transactions on Consumer Electronics

Generative Artificial Intelligence (AI) has revolutionized image generation in the realm of consumer electronics, which has illustrated its significant impact on product development and user experiences. In this paper, we propose a class conditioned GAN with dual clustering to leverage correlations across both spatial and approximated discrete cosine transform (ADCT) domain towards improved diverse image generations. By analyzing the spectral bias from a frequency perspective through clustering in ADCT domain, the proposed achieves the advantage that class-conditioning provided by pixel clustering can be significantly strengthened and complemented by ADCT clustering. The sequential arrangement of ADCT clustering followed by pixel clustering not only optimizes their individual contribution and coordination, but also avoid the need to retrain the conditional generator and discriminator from scratch. Extensive experiments carried out illustrate that, in terms of FID and IS measurements as well as synthesized quality, integrity and diversity, our proposed achieves significant superiority against the existing state of the arts.



Denoiser-Regulated Deep Unfolding Compressed Sensing with Learnable Fixed-Point Projections

January 2024

·

6 Reads

·

1 Citation

IEEE Transactions on Circuits and Systems for Video Technology

The family of regularization by denoising (RED) methods introduce denoising operator as the regularization term to perform compressed sensing (CS) reconstruction, which shows higher flexibility and scalability. However, traditional RED framework has strict requirements on several properties of denoiser, making it hard to design the specific denoiser and limits the quality of reconstructed images. Although some relaxation for denoisers can be made by incorporating the fixed point projection during the iteration process, the involved parameters have great impact on the effectiveness and efficiency of the algorithm, which is non-trivial to set them properly. In this paper, we propose an innovative Deep Unfolding Network framework termed FP-DUN based on the iterative process of Regularization by Denoising via Fixed-Point Projection (RED-PRO). In FP-DUN, fix-point projection module is implemented with learnable weights of neural networks, where an effective denoiser based on dual attention mechanism (DAM) is developed to capture the details of the reconstructed image. Additionally, we propose a new loss function based on fixed point constraints, which is able to overcome the over-smoothness caused by multi-stage denoising and maintain the structural details to progressively improve the reconstruction quality. By training the DUN model, the parameters for the process of fix point projection and denoiser are learned automatically. Extensive experimental results comparing with state-of-the-art CS algorithms and traditional RED-PRO approach validate the effectiveness of FP-DUN, especially on some images with complex details.


Weakly-Supervised 3D Scene Graph Generation via Visual-Linguistic Assisted Pseudo-Labeling

January 2024

IEEE Transactions on Multimedia

Learning to build 3D scene graphs is essential for real-world perception in a structured and rich fashion. However, previous 3D scene graph generation methods utilize a fully supervised learning manner and require a large amount of entity-level annotation data of objects and relations, which is extremely resource-consuming and tedious to obtain. To tackle this problem, we propose 3D-VLAP, a weakly-supervised 3D scene graph generation method via Visual-Linguistic Assisted Pseudo-labeling. Specifically, our 3D-VLAP exploits the superior ability of current large-scale visual-linguistic models to align the semantics between texts and 2D images, as well as the naturally existing correspondences between 2D images and 3D point clouds, and thus implicitly constructs correspondences between texts and 3D point clouds. First, we establish the positional correspondence from 3D point clouds to 2D images via camera intrinsic and extrinsic parameters, thereby achieving alignment of 3D point clouds and 2D images. Subsequently, a large-scale crossmodal visual-linguistic model is employed to indirectly align 3D instances with the textual category labels of objects by matching 2D images with object category labels. The pseudo labels for objects and relations are then produced for 3D-VLAP model training by calculating the similarity between visual embeddings and textual category embeddings of objects and relations encoded by the visual-linguistic model, respectively. Ultimately, we design an edge self-attention based graph neural network to generate scene graphs of 3D point cloud scenes. Extensive experiments demonstrate that our 3D-VLAP achieves comparable results with current advanced fully supervised methods, meanwhile significantly alleviating the pressure of data annotation



Fig. 1: Network architecture of the proposed CSRN when the sample ratio is 0.2. For the sampling process, two Conv(102, 32, 32) layers are applied to mimic image sampling. For reconstruction, the initial reconstruction sub-network with a two-layer structure is used to reconstruct the initial image, and then the residual reconstruction sub-network is used to recover the residual image, which will be added to the initial image to obtain the final reconstructed image.
Fig. 2: The structure of the initial reconstruction sub-network when the sample ratio is greater than 0.1.
Fig. 3: Recurrent residual fusion module. The dotted line denotes the recurrent connection.
Fig. 4: Reconstruction performance on Set5 for different combinations of N and T.
Fig. 5: Visual quality comparison of different optimization-and plain-network-based CS algorithms on "Butterfly" from Set5 for the sample ratio 0.05.

+1

A Lightweight Recurrent Learning Network for Sustainable Compressed Sensing
  • Preprint
  • File available

April 2023

·

59 Reads

Recently, deep learning-based compressed sensing (CS) has achieved great success in reducing the sampling and computational cost of sensing systems and improving the reconstruction quality. These approaches, however, largely overlook the issue of the computational cost; they rely on complex structures and task-specific operator designs, resulting in extensive storage and high energy consumption in CS imaging systems. In this paper, we propose a lightweight but effective deep neural network based on recurrent learning to achieve a sustainable CS system; it requires a smaller number of parameters but obtains high-quality reconstructions. Specifically, our proposed network consists of an initial reconstruction sub-network and a residual reconstruction sub-network. While the initial reconstruction sub-network has a hierarchical structure to progressively recover the image, reducing the number of parameters, the residual reconstruction sub-network facilitates recurrent residual feature extraction via recurrent learning to perform both feature fusion and deep reconstructions across different scales. In addition, we also demonstrate that, after the initial reconstruction, feature maps with reduced sizes are sufficient to recover the residual information, and thus we achieved a significant reduction in the amount of memory required. Extensive experiments illustrate that our proposed model can achieve a better reconstruction quality than existing state-of-the-art CS algorithms, and it also has a smaller number of network parameters than these algorithms. Our source codes are available at: https://github.com/C66YU/CSRN.

Download

A Lightweight Recurrent Learning Network for Sustainable Compressed Sensing

January 2023

·

10 Reads

·

3 Citations

IEEE Transactions on Emerging Topics in Computational Intelligence

Recently, deep learning-based compressed sensing (CS) has achieved great success in reducing the sampling and computational cost of sensing systems and improving the reconstruction quality. These approaches, however, largely overlook the issue of the computational cost; they rely on complex structures and task-specific operator designs, resulting in extensive storage and high energy consumption in CS imaging systems. In this article, we propose a lightweight but effective deep neural network based on recurrent learning to achieve a sustainable CS system; it requires a smaller number of parameters but obtains high-quality reconstructions. Specifically, our proposed network consists of an initial reconstruction sub-network and a residual reconstruction sub-network. While the initial reconstruction sub-network has a hierarchical structure to progressively recover the image, reducing the number of parameters, the residual reconstruction sub-network facilitates recurrent residual feature extraction via recurrent learning to perform both feature fusion and deep reconstructions across different scales. In addition, we also demonstrate that, after the initial reconstruction, feature maps with reduced sizes are sufficient to recover the residual information, and thus we achieved a significant reduction in the amount of memory required. Extensive experiments illustrate that our proposed model can achieve a better reconstruction quality than existing state-of-the-art CS algorithms, and it also has a smaller number of network parameters than these algorithms. Our source codes are available at: https://github.com/C66YU/CSRN .


Citations (70)


... Since continuation appeals to model-based methods, it appears intuitive to also apply it in the unfolding regime. Our work is inspired by [28], [29], [30]. In [28], continuation is utilized to warm-start the proposed model-based CS reconstruction algorithm. ...

Reference:

How to warm-start your unfolding network
Denoiser-Regulated Deep Unfolding Compressed Sensing with Learnable Fixed-Point Projections
  • Citing Article
  • January 2024

IEEE Transactions on Circuits and Systems for Video Technology

... The advent of deep learning has introduced new possibilities [17][18][19]. More recent studies have started exploring convolutional neural networks (CNNs) [20][21][22] and Generative Adversarial Networks (GAN) [23][24][25] for this purpose, achieving better accuracy while preserving important dermatological features. ...

Dual-Clustered Conditioning Toward GAN-Based Diverse Image Generation
  • Citing Article
  • February 2024

IEEE Transactions on Consumer Electronics

... Many existing approaches use supervised learning, which demands high-quality and abundant data. To mitigate the need for extensive labeled datasets, methods like zero-shot learning [107] and self-supervised learning [108] can be applied. Additionally, integrating prior knowledge about the physical features and application of drones can enhance algorithm interpretability while decreasing data dependence. ...

Distortion-Aware Self-Supervised Indoor 360360 ^{\circ } Depth Estimation via Hybrid Projection Fusion and Structural Regularities
  • Citing Article
  • January 2023

IEEE Transactions on Multimedia

... A series of detection methods such as SSD [24], YOLOv2 [25], and EfficientDet [26] eliminate the process of generating proposals, which significantly improves the detection speed, and the networks are designed to be more lightweight. A lightweight deep neural network was proposed based on recurrent learning, which employed a feature compression strategy to reduce the number of parameters of the model [27]. Currently, with the increasing in-depth research on anchor-free [28] object detection algorithms, the DETR [29] model uses a transformer [30] to directly predict the extracted features of the backbone network and thoroughly realizes end-to-end. ...

A Lightweight Recurrent Learning Network for Sustainable Compressed Sensing
  • Citing Article
  • January 2023

IEEE Transactions on Emerging Topics in Computational Intelligence

... EnlightenGAN incorporates a global-local discriminator structure to capture more detailed features, coupled with self-regularized perceptual loss and attention mechanisms for enhanced results. Recently, Wu et al. [27] introduced URetinex-Net, a Retinex-based deep unfolding network. This approach reformulates the optimization problem into a learnable network, effectively addressing the decomposition problem by implicitly regularizing the model. ...

URetinex-Net: Retinex-based Deep Unfolding Network for Low-light Image Enhancement
  • Citing Conference Paper
  • June 2022

... The next step in the process is to propose a multi-graph fusion model which determines valuable information from multi-modal images by selectively learning it. Based on adaptive viewpoint feature enhancement via binocular vision, [27] proposes a method for detecting visual saliency in stereoscopic images. An aggregation module is delicately designed for binocular stereoscopic saliency features, generating more representative saliency features towards binocular vision. ...

Adaptive Viewpoint Feature Enhancement-Based Binocular Stereoscopic Image Saliency Detection
  • Citing Article
  • October 2022

IEEE Transactions on Circuits and Systems for Video Technology

... Li [2] et al. proposed a bandit-based AOS method including four popular DE mutation operators, which selects operators based on the received credit values. Dong et al. [32] proposed an adaptive operator selection with test-and-apply structure (TAOS), which constructs an operator pool with four operators. In TAOS, the evolutionary process is divided into several continuous segments. ...

Adaptive Operator Selection with Test-and-Apply Structure for Decomposition-based Multi-objective Optimization
  • Citing Article
  • November 2021

Swarm and Evolutionary Computation

... Lastly, character concept generation from random noise [45], [51], [54], line drawings/sketches [46], [48] and blurred silhouettes [49]. Less explored generation problems include logos [103], pixel art characters [47], [48], [77], [90], sprite assets [13], [48], [90], map chunks [13], [90], input image pixelization [78], game effects [73] and Graphical User Interfaces (GUI) [86], [87], [88]. ...

Generative synthesis of logos across DCT domain
  • Citing Article
  • October 2021

Neurocomputing

... In recent years, deep learning-based approaches have made significant progress in the field of painting image processing. To address the painting image restoration problem, Chang et al. proposed an image restoration model that combines a deep context model with a back-propagation mechanism [7]. This model incorporates weighted contextual loss in the back-propagation process to extract richer edge features, thereby enhancing the performance of painting image restoration. ...

Fine Tuning of Deep Contexts Toward Improved Perceptual Quality of In-Paintings
  • Citing Article
  • September 2021

IEEE Transactions on Cybernetics

... As shown in Table 4, compared with schemes (a) and (b), our model has a significant advantage. Although scheme (c) has more viewpoints, according to the analysis by reference [50], most of the selected viewpoints are close to the central viewpoint, resulting in data redundancy, angle bias issues, and worse performance. In contrast to the proposed input scheme, the concentric-circle input scheme places more emphasis on horizontal and vertical views while downplaying diagonal views. ...

Geometry Auxiliary Salient Object Detection for Light Fields via Graph Neural Networks
  • Citing Article
  • September 2021

IEEE Transactions on Image Processing