Jian Sun’s research while affiliated with Shandong University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (398)


Multi-scale Part-based Feature Representation for 3D Domain Generalization and Adaptation
  • Article

November 2024

IEEE Transactions on Pattern Analysis and Machine Intelligence

Xin Wei

·

Xiang Gu

·

Jian Sun

Deep networks on 3D point clouds have achieved remarkable success in 3D classification, but they are vulnerable to geometric variations resulting from inconsistent data acquisition procedures. This leads to challenging 3D domain generalization and adaptation tasks, aiming to tackle the challenge that the performance of a model trained on a source domain will degrade on an out-of-distribution target domain. In this paper, we introduce a novel Multi-Scale Part-based feature Representation, dubbed MSPR, as a generalizable representation for point cloud domain generalization and adaptation. Rather than relying on the global point cloud feature representation, we align the part-level features of shapes at different scales to a set of learnable part-template features, which can encode local geometric structures shared between the source and the target domains. Specifically, we construct a part-template feature space shared between source and target domains. Shapes from different domains are organized into part-level features at various scales and then aligned to the part-template features. To leverage the generalization ability of small-scale parts and the discrimination ability of large-scale parts, we further design a cross-scale feature fusion module to exchange information between aligned part-based features at different scales. The fused part-based representations are finally aggregated by a part-based feature aggregation module. To improve the robustness of the aligned part-based representations and global shape representation to geometry variations, we further propose a Contrastive Learning framework on Shape Representation (CLSR), applied to both 3D domain generalization and adaptation tasks. We conduct experiments on 3D domain generalization and adaptation benchmarks for point cloud classification. Experimental results demonstrate the effectiveness of our proposed approach, outperforming the previous state-of-the-art methods for both domain generalization and adaptation tasks. Ablation studies confirm the effectiveness of the proposed components in our model.



SAM-UNet:Enhancing Zero-Shot Segmentation of SAM for Universal Medical Images

August 2024

·

23 Reads

Segment Anything Model (SAM) has demonstrated impressive performance on a wide range of natural image segmentation tasks. However, its performance significantly deteriorates when directly applied to medical domain, due to the remarkable differences between natural images and medical images. Some researchers have attempted to train SAM on large scale medical datasets. However, poor zero-shot performance is observed from the experimental results. In this context, inspired by the superior performance of U-Net-like models in medical image segmentation, we propose SAMUNet, a new foundation model which incorporates U-Net to the original SAM, to fully leverage the powerful contextual modeling ability of convolutions. To be specific, we parallel a convolutional branch in the image encoder, which is trained independently with the vision Transformer branch frozen. Additionally, we employ multi-scale fusion in the mask decoder, to facilitate accurate segmentation of objects with different scales. We train SAM-UNet on SA-Med2D-16M, the largest 2-dimensional medical image segmentation dataset to date, yielding a universal pretrained model for medical images. Extensive experiments are conducted to evaluate the performance of the model, and state-of-the-art result is achieved, with a dice similarity coefficient score of 0.883 on SA-Med2D-16M dataset. Specifically, in zero-shot segmentation experiments, our model not only significantly outperforms previous large medical SAM models across all modalities, but also substantially mitigates the performance degradation seen on unseen modalities. It should be highlighted that SAM-UNet is an efficient and extensible foundation model, which can be further fine-tuned for other downstream tasks in medical community. The code is available at https://github.com/Hhankyangg/sam-unet.


Prototypical Partial Optimal Transport for Universal Domain Adaptation

August 2024

Universal domain adaptation (UniDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain without requiring the same label sets of both domains. The existence of domain and category shift makes the task challenging and requires us to distinguish "known" samples (i.e., samples whose labels exist in both domains) and "unknown" samples (i.e., samples whose labels exist in only one domain) in both domains before reducing the domain gap. In this paper, we consider the problem from the point of view of distribution matching which we only need to align two distributions partially. A novel approach, dubbed mini-batch Prototypical Partial Optimal Transport (m-PPOT), is proposed to conduct partial distribution alignment for UniDA. In training phase, besides minimizing m-PPOT, we also leverage the transport plan of m-PPOT to reweight source prototypes and target samples, and design reweighted entropy loss and reweighted cross-entropy loss to distinguish "known" and "unknown" samples. Experiments on four benchmarks show that our method outperforms the previous state-of-the-art UniDA methods.




Training algorithm.
Architecture of our ARPM. Red (resp. blue) arrows indicate the computational flow for source (resp. target) domain data. Both source and target images are mapped to feature space by the feature extractor. Our adversarial reweighting model automatically reweights the importance of source domain data to match the target domain distribution in feature space to decrease the importance of the data of source-private class data. We then define a reweighted classification loss on the reweighted source domain data distribution to train the recognition model to classify common class data. An α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}-power maximization is proposed to reduce the prediction uncertainty on the target domain. We also utilize neighborhood reciprocity clustering (Yang et al., 2021b) to impose the robustness of the recognition model on the target domain
Illustration of negative transfer caused by the source-private class data in PDA. The source and target features are respectively in red and blue. Some of the target domain samples are unavoidably aligned with the source-private class data in feature adaptation by distribution alignment, and are incorrectly recognized by the recognition model
Intuitive motivations of ARPM. We reweight source domain data by our adversarial reweighting model to assign smaller weights to source-private class data. The classification loss can enforce lower prediction uncertainty mainly on source domain common class data. We propose the α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}-power maximization to lower prediction uncertainty on target samples. Intuitively, to achieve lower prediction uncertainty, the target samples will be pushed toward the regions of source domain common class data
We minimize the Wasserstein distance between reweighted source feature distribution P^z(w)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{P}}_{\textbf{z}}(\textbf{w})$$\end{document} and target feature distribution Q^z\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{Q}}_{\textbf{z}}$$\end{document} to learn weights w\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{w}$$\end{document}. This idea is further transformed into the adversarial reweighting model

+9

Adversarial Reweighting with αα\alpha -Power Maximization for Domain Adaptation
  • Article
  • Publisher preview available

May 2024

·

38 Reads

·

1 Citation

International Journal of Computer Vision

The practical Domain Adaptation (DA) tasks, e.g., Partial DA (PDA), open-set DA, universal DA, and test-time adaptation, have gained increasing attention in the machine learning community. In this paper, we propose a novel approach, dubbed Adversarial Reweighting with αα\alpha -Power Maximization (ARPM), for PDA where the source domain contains private classes absent in target domain. In ARPM, we propose a novel adversarial reweighting model that adversarially learns to reweight source domain data to identify source-private class samples by assigning smaller weights to them, for mitigating potential negative transfer. Based on the adversarial reweighting, we train the transferable recognition model on the reweighted source distribution to be able to classify common class data. To reduce the prediction uncertainty of the recognition model on the target domain for PDA, we present an αα\alpha -power maximization mechanism in ARPM, which enriches the family of losses for reducing the prediction uncertainty for PDA. Extensive experimental results on five PDA benchmarks, e.g., Office-31, Office-Home, VisDA-2017, ImageNet-Caltech, and DomainNet, show that our method is superior to recent PDA methods. Ablation studies also confirm the effectiveness of components in our approach. To theoretically analyze our method, we deduce an upper bound of target domain expected error for PDA, which is approximately minimized in our approach. We further extend ARPM to open-set DA, universal DA, and test time adaptation, and verify the usefulness through experiments.

View access options

Uncovering critical transitions and molecule mechanisms in disease progressions using Gaussian graphical optimal transport

April 2024

·

18 Reads

Understanding disease progression is crucial for detecting critical transitions and finding trigger molecules, facilitating early diagnosis interventions. However, the high dimensionality of data and the lack of aligned samples across disease stages have posed challenges in addressing these tasks. We present a novel framework, Gaussian Graphical Optimal Transport (GGOT), for analyzing disease progressions. The proposed GGOT uses Gaussian graphical models, incorporating protein interaction networks, to characterize the data distributions at different disease stages. Then we use population-level optimal transport to calculate the Wasserstein distances and transport maps between stages, enabling us to detect critical transitions. By analyzing the per-molecule transport distance, we quantify the importance of each molecule and identify trigger molecules. Moreover, GGOT predicts the occurrence of critical transitions in unseen samples and visualizes the disease progression process. We apply GGOT to the simulation dataset and six disease datasets with varying disease progression rates, to show its effectiveness for detecting critical transitions and identifying trigger molecules.


Polarimetry-Inspired Contrastive Learning for Class-Imbalanced PolSAR Image Classification

January 2024

·

11 Reads

·

5 Citations

IEEE Transactions on Geoscience and Remote Sensing

In recent years, deep neural networks have significantly boosted the performance of polarimetric synthetic aperture radar (PolSAR) image classification. However, existing deep learning-based approaches still suffer from the following limitations. First, the performance of them is subject to the availability of massive annotations that are difficult to acquire for PolSAR images. Second, the class imbalance in PolSAR data greatly hinders the correct classification of minority yet equally pivotal classes. To overcome the above shortcomings, we propose a polarimetry-inspired contrastive learning (CL) PolSAR image classification (PiCL) approach, in the hope of elevating the classification accuracy by taking advantage of the polarimetric domain knowledge. First, a complex-valued CL (CVCL) framework is designed, via which powerful polarimetric representations are learned without any manual annotations. Specifically, we innovatively design two distribution-inspired positive sample generation (PSG) strategies, i.e., Wishart-distance-based PSG (WishartPSG) and noise-injection PSG (NoisePSG), to enable discriminative and domain-specific representation learning. A novel hybrid anti-imbalance scheme is further devised to tackle the class imbalance issue, which combines a contextual consistency-based pseudo-label generation (PLG) and a weighted feature-level synthetic data oversampling technique. It should be highlighted that the domain knowledge of PolSAR, including the data and noise distributions, complex-valued (CV) characteristics, and the spatial consistency prior, is fully exploited throughout our model design. Extensive experiments on four benchmark datasets demonstrated the effectiveness of the proposed model. For the Flevoland 1989 dataset, our method improves the overall accuracy (OA), average accuracy (AA), and Kappa metrics by 3.54%, 6.81%, and 7.29%, respectively, compared to the existing state-of-the-art method. Our code will be available at https://github.com/HaixiaBi1982/PiCL .


Adversarial data splitting for domain generalization

December 2023

·

18 Reads

·

1 Citation

Science China Information Sciences

Domain generalization aims to learn a model that is generalizable to an unseen target domain, which is a fundamental and challenging task in machine learning for out-of-distribution generalization. This paper proposes a novel domain generalization approach that enforces the learned model to be able to generalize well over the train/val subset splitting of the training dataset. This idea is modeled herein as an adversarial data splitting framework, formulated as a min-max optimization problem inspired by the meta-learning approach. The min-max optimization problem is solved by iteratively splitting the training dataset into the training and val subsets to maximize the domain shift measured by the objective function and updating the model parameters to enable the model to generalize well from the training subset to the val subset by minimizing the objective function. This adversarial training approach does not assume the known domain labels of the training data; instead, it automatically investigates the “hard” splitting of the train/val subsets to learn the generalizable model. Extensive experimental results using three benchmark datasets demonstrate the superiority of this approach. In addition, we derive a generalization error bound for the theoretical understanding of our proposed approach.


Citations (52)


... [10] proposes a two-staged contrastive learning based network for polarimetric representation learning, and designs a Transformer-based sub-patch attention encoder to model the context within patch samples. [15,29,16] put forward self-supervised PolSAR image classification methods without the involvement of negative samples. ...

Reference:

Dual-branch PolSAR Image Classification Based on GraphMAE and Local Feature Extraction
Polarimetry-Inspired Contrastive Learning for Class-Imbalanced PolSAR Image Classification
  • Citing Article
  • January 2024

IEEE Transactions on Geoscience and Remote Sensing

... Some earlier approaches [57]- [60] utilize encoderdecoder structures to construct the relationship between images and text for editing. To enable semantic editing, some approaches [19], [35], [39]- [41] propose editing within GANs, leveraging pre-trained large-scale multi-modal models, e.g., CLIP [37]. StyleCLIP [19] introduces a paradigm to optimize the latent code of GANs. ...

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images
  • Citing Conference Paper
  • October 2023

... Simple optimization algorithms are surprisingly able to find uneventful descent paths in the non-convex cost landscape of deep learning networks (Gu et al., 2021;Sagun et al., 2017). However they also tend to construct features that capture spurious correlations (Szegedy et al., 2014;Beery et al., 2018;Arjovsky et al., 2020;Ilyas et al., 2019). ...

Adversarial data splitting for domain generalization
  • Citing Article
  • December 2023

Science China Information Sciences

... Finally, it can be observed that the dual learning structure proposed initially by Cao et al. [23] and adapted for the bovine problem in this article is of great importance for the final result. In their paper, Yu et al. [36] adopted a similar methodology in the Augmented Feature Adaptation layer of their model, learning to distance the classes from each other according to the features extracted by their Augmented Feature Generation layer and similar ways to measure loss. ...

Contrasting augmented features for domain adaptation with limited target domain data
  • Citing Article
  • November 2023

Pattern Recognition

... Therefore, in many studies, images with shorter acquisition times are used as auxiliary modalities (prior information) to guide the accelerated imaging of the target modality with longer acquisition times [14,26,27]. Joint reconstruction explores complementary information among multi-modal images in varying degrees [12,28,29]. In addition, super-resolution reconstruction [30][31][32], can enhance image resolution and introduce highresolution (HR) modalities to guide the restoration of low-resolution (LR) modalities. ...

MD-GraphFormer: A Model-Driven Graph Transformer for Fast Multi-Contrast MR Imaging
  • Citing Article
  • January 2023

IEEE Transactions on Computational Imaging

... To address this, we set a limit on the amount of synthetic data produced for training the student model [17]. This approach aligns with practices in domains such as Continual Learning [15,16] and Federated Learning [27,41], where synthetic data generation is also restricted to manage resource demands. In our study, we test various data ratios on ImageNet1k and its subsets to provide a balanced comparison across methods. ...

Memory efficient data-free distillation for continual learning
  • Citing Article
  • August 2023

Pattern Recognition

... The numerous benefits of these spatial control methods could be valuable for conditional generation in the 3D medical image domain, where publicly available training data is limited compared to natural images [32], and the huge training costs of 3D diffusion models are usually unaffordable with an enterprise-level GPU. Especially, their application in 3D medical images has the potential for downstream tasks, including super-resolution [53] and image translation [55] with conditions of partial images and other imaging scans. Furthermore, such control methods can be utilized to synthesize high-quality, precise, and privacy-concern-free medical images with the same semantics as real patients, thus addressing the scarcity of public data in medical imaging [3,23,44]. ...

Learning Unified Hyper-Network for Multi-Modal MR Image Synthesis and Tumor Segmentation With Missing Modalities
  • Citing Article
  • August 2023

IEEE Transactions on Medical Imaging

... Domain generalization for dense predictions. Domain Generalization (DG) methods for dense predictions [5,9,19,35,39,42,46,50,61,73,79,82,85] have recently gar-nered considerable attention due to their practical demands. These methods can be categorized along two main axes: i) data augmentation or domain randomization, both of which diversify the training process, and ii) alignment techniques to suppress domain-relevant features. ...

Generalized Semantic Segmentation by Self-Supervised Source Domain Projection and Multi-Level Contrastive Learning
  • Citing Article
  • June 2023

Proceedings of the AAAI Conference on Artificial Intelligence

... To address this, we set a limit on the amount of synthetic data produced for training the student model [17]. This approach aligns with practices in domains such as Continual Learning [15,16] and Federated Learning [27,41], where synthetic data generation is also restricted to manage resource demands. In our study, we test various data ratios on ImageNet1k and its subsets to provide a balanced comparison across methods. ...

Variational Data-Free Knowledge Distillation for Continual Learning
  • Citing Article
  • May 2023

IEEE Transactions on Pattern Analysis and Machine Intelligence

... To comprehensively validate the effectiveness of the proposed method, similar to prior methods (Chen et al., 2022a;Li et al., 2023a;, we choose two commonly used basic architectures including the U-shape hierarchical architecture shown in Fig. 3(c) and the columnar architecture shown in Fig. 8 of Appx. A.1. ...

Simple Baselines for Image Restoration
  • Citing Chapter
  • November 2022

Lecture Notes in Computer Science