Gang Niu’s research while affiliated with RIKEN Center for Advanced Intelligence Project and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (149)


Learning Locally, Revising Globally: Global Reviser for Federated Learning with Noisy Labels
  • Preprint
  • File available

November 2024

·

5 Reads

Yuxin Tian

·

Mouxing Yang

·

Yuhao Zhou

·

[...]

·

Jiancheng Lv

The success of most federated learning (FL) methods heavily depends on label quality, which is often inaccessible in real-world scenarios, such as medicine, leading to the federated label-noise (F-LN) problem. In this study, we observe that the global model of FL memorizes the noisy labels slowly. Based on the observations, we propose a novel approach dubbed Global Reviser for Federated Learning with Noisy Labels (FedGR) to enhance the label-noise robustness of FL. In brief, FedGR employs three novel modules to achieve noisy label sniffing and refining, local knowledge revising, and local model regularization. Specifically, the global model is adopted to infer local data proxies for global sample selection and refine incorrect labels. To maximize the utilization of local knowledge, we leverage the global model to revise the local exponential moving average (EMA) model of each client and distill it into the clients' models. Additionally, we introduce a global-to-local representation regularization to mitigate the overfitting of noisy labels. Extensive experiments on three F-LNL benchmarks against seven baseline methods demonstrate the effectiveness of the proposed FedGR.

Download



On Unsupervised Prompt Learning for Classification with Black-box Language Models

October 2024

·

9 Reads

Large language models (LLMs) have achieved impressive success in text-formatted learning problems, and most popular LLMs have been deployed in a black-box fashion. Meanwhile, fine-tuning is usually necessary for a specific downstream task to obtain better performance, and this functionality is provided by the owners of the black-box LLMs. To fine-tune a black-box LLM, labeled data are always required to adjust the model parameters. However, in many real-world applications, LLMs can label textual datasets with even better quality than skilled human annotators, motivating us to explore the possibility of fine-tuning black-box LLMs with unlabeled data. In this paper, we propose unsupervised prompt learning for classification with black-box LLMs, where the learning parameters are the prompt itself and the pseudo labels of unlabeled data. Specifically, the prompt is modeled as a sequence of discrete tokens, and every token has its own to-be-learned categorical distribution. On the other hand, for learning the pseudo labels, we are the first to consider the in-context learning (ICL) capabilities of LLMs: we first identify reliable pseudo-labeled data using the LLM, and then assign pseudo labels to other unlabeled data based on the prompt, allowing the pseudo-labeled data to serve as in-context demonstrations alongside the prompt. Those in-context demonstrations matter: previously, they are involved when the prompt is used for prediction while they are not involved when the prompt is trained; thus, taking them into account during training makes the prompt-learning and prompt-using stages more consistent. Experiments on benchmark datasets show the effectiveness of our proposed algorithm. After unsupervised prompt learning, we can use the pseudo-labeled dataset for further fine-tuning by the owners of the black-box LLMs.


Estimating Per-Class Statistics for Label Noise Learning

September 2024

·

14 Reads

·

2 Citations

IEEE Transactions on Pattern Analysis and Machine Intelligence

Real-world data may contain a considerable amount of noisily labeled examples, which usually mislead the training algorithm and result in degraded classification performance on test data. Therefore, Label Noise Learning (LNL) was proposed, of which one popular research trend focused on estimating the critical statistics (e.g., sample mean and sample covariance), to recover the clean data distribution. However, existing methods may suffer from the unreliable sample selection process or can hardly be applied to multi-class cases. Inspired by the centroid estimation theory, we propose Per-Class Statistic Estimation (PCSE), which establishes the quantitative relationship between the clean (first-order and second-order) statistics and the corresponding noisy statistics for every class. This relationship is further utilized to induce a generative classifier for model inference. Unlike existing methods, our approach does not require sample selection from the instance level. Moreover, our PCSE can serve as a general post-processing strategy applicable to various popular networks pre-trained on the noisy dataset for boosting their classification performance. Theoretically, we prove that the estimated statistics converge to their ground-truth values as the sample size increases, even if the label transition matrix is biased. Empirically, we conducted intensive experiments on various binary and multi-class datasets, and the results demonstrate that PCSE achieves more precise statistic estimation as well as higher classification accuracy when compared with state-of-the-art methods in LNL.



Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning

July 2024

·

1 Read

Semi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations. Unlike semi-supervised learning, one cannot select the most probable label as the pseudo-label in SSMLL due to multiple semantics contained in an instance. To solve this problem, the mainstream method developed an effective thresholding strategy to generate accurate pseudo-labels. Unfortunately, the method neglected the quality of model predictions and its potential impact on pseudo-labeling performance. In this paper, we propose a dual-perspective method to generate high-quality pseudo-labels. To improve the quality of model predictions, we perform dual-decoupling to boost the learning of correlative and discriminative features, while refining the generation and utilization of pseudo-labels. To obtain proper class-wise thresholds, we propose the metric-adaptive thresholding strategy to estimate the thresholds, which maximize the pseudo-label performance for a given metric on labeled data. Experiments on multiple benchmark datasets show the proposed method can achieve the state-of-the-art performance and outperform the comparative methods with a significant margin.


Decoupling the Class Label and the Target Concept in Machine Unlearning

June 2024

Machine unlearning as an emerging research topic for data regulations, aims to adjust a trained model to approximate a retrained one that excludes a portion of training data. Previous studies showed that class-wise unlearning is successful in forgetting the knowledge of a target class, through gradient ascent on the forgetting data or fine-tuning with the remaining data. However, while these methods are useful, they are insufficient as the class label and the target concept are often considered to coincide. In this work, we decouple them by considering the label domain mismatch and investigate three problems beyond the conventional all matched forgetting, e.g., target mismatch, model mismatch, and data mismatch forgetting. We systematically analyze the new challenges in restrictively forgetting the target concept and also reveal crucial forgetting dynamics in the representation level to realize these tasks. Based on that, we propose a general framework, namely, TARget-aware Forgetting (TARF). It enables the additional tasks to actively forget the target concept while maintaining the rest part, by simultaneously conducting annealed gradient ascent on the forgetting data and selected gradient descent on the hard-to-affect remaining data. Empirically, various experiments under the newly introduced settings are conducted to demonstrate the effectiveness of our TARF.


Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

May 2024

·

10 Reads

In federated learning (FL), the multi-step update and data heterogeneity among clients often lead to a loss landscape with sharper minima, degenerating the performance of the resulted global model. Prevalent federated approaches incorporate sharpness-aware minimization (SAM) into local training to mitigate this problem. However, the local loss landscapes may not accurately reflect the flatness of global loss landscape in heterogeneous environments; as a result, minimizing local sharpness and calculating perturbations on client data might not align the efficacy of SAM in FL with centralized training. To overcome this challenge, we propose FedLESAM, a novel algorithm that locally estimates the direction of global perturbation on client side as the difference between global models received in the previous active and current rounds. Besides the improved quality, FedLESAM also speed up federated SAM-based approaches since it only performs once backpropagation in each iteration. Theoretically, we prove a slightly tighter bound than its original FedSAM by ensuring consistent perturbation. Empirically, we conduct comprehensive experiments on four federated benchmark datasets under three partition strategies to demonstrate the superior performance and efficiency of FedLESAM.


Balancing Similarity and Complementarity for Federated Learning

May 2024

·

3 Reads

In mobile and IoT systems, Federated Learning (FL) is increasingly important for effectively using data while maintaining user privacy. One key challenge in FL is managing statistical heterogeneity, such as non-i.i.d. data, arising from numerous clients and diverse data sources. This requires strategic cooperation, often with clients having similar characteristics. However, we are interested in a fundamental question: does achieving optimal cooperation necessarily entail cooperating with the most similar clients? Typically, significant model performance improvements are often realized not by partnering with the most similar models, but through leveraging complementary data. Our theoretical and empirical analyses suggest that optimal cooperation is achieved by enhancing complementarity in feature distribution while restricting the disparity in the correlation between features and targets. Accordingly, we introduce a novel framework, \texttt{FedSaC}, which balances similarity and complementarity in FL cooperation. Our framework aims to approximate an optimal cooperation network for each client by optimizing a weighted sum of model similarity and feature complementarity. The strength of \texttt{FedSaC} lies in its adaptability to various levels of data heterogeneity and multimodal scenarios. Our comprehensive unimodal and multimodal experiments demonstrate that \texttt{FedSaC} markedly surpasses other state-of-the-art FL methods.


Citations (29)


... It is still a challenge for the student model with poor domain-invariant feature extraction capabilities. Inspired by the spectral theorem that the frequency domain obeys the nature of global modeling [33], [34], we propose to improve the domain generalization ability of student models via Fast Fourier Transform (FFT) [25]: ...

Reference:

Domain-invariant Progressive Knowledge Distillation for UAV-based Object Detection
Direct Distillation Between Different Domains
  • Citing Chapter
  • October 2024

... A promising strategy is to leverage pre-trained Vision-Language Models (VLMs) (Radford et al. 2021;Li et al. 2023). This can be achieved through techniques such as fine-tuning VLMs on specific datasets, applying knowledge distillation (Yang et al. 2023;Liao et al. 2022), or directly integrating these models into the architecture (Gu et al. 2021;Ning et al. 2023). Given these insights, we leverage VLMs for zeroshot attribute classification, focusing on developing a effective and scalable approach to handle unseen classes. ...

Multi-Label Knowledge Distillation
  • Citing Conference Paper
  • October 2023

... However, in practical applications, these approaches might be ineffective because the original data is usually unavailable due to privacy concerns. To address the above issue, generation-based (Tran et al. 2024;Wang et al. 2024a,b) and collection-based (Chen et al. 2021b;Tang et al. 2023) DFKD methods have been proposed to train student networks using synthetic and collected data, respectively. The generation-based methods utilize the teacher network to guide a generator in producing examples from statistics in the teacher network or random noise. ...

Distribution Shift Matters for Knowledge Distillation with Webly Collected Images
  • Citing Conference Paper
  • October 2023

... The key to successful learning from ambiguous supervision is label disambiguation, i.e., identifying the true labels from the candidate sets. To achieve this, existing partial label learning (PLL) algorithms (Wang et al. 2022c;Yan and Guo 2023;Li et al. 2023;Jia et al. 2024) mostly rely on the self-training algorithm that assigns pseudo-labels using the model predictions. However, such a strategy can be problematic in our IPLL framework. ...

PiCO+: Contrastive Label Disambiguation for Robust Partial Label Learning
  • Citing Article
  • December 2023

IEEE Transactions on Pattern Analysis and Machine Intelligence

... Nonetheless, accurately modeling the general process, specifically the transition matrix, is ill-posed by only exploiting noisy data [Xia et al., 2019, Cheng et al., 2020. The additional assumptions required for this model are often hard to validate and may not hold in real-world datasets, as indicated by recent studies [Xia et al., 2020, Yao et al., 2023b. This mismatch frequently results in an inability to eliminate label noise thoroughly. ...

Causality Encourages the Identifiability of Instance-Dependent Label Noise
  • Citing Chapter
  • August 2023

... Then, the features are quantified and saved in the database, and finally, an index mechanism is established to facilitate users' query. In the field of clothing image retrieval, there are many ways to measure the similarity level of two images, one of which is Metric learning [36]. The accuracy of image retrieval has a great relationship with the selected similarity measurement method. ...

Boundary-restricted metric learning

Machine Learning

... Recent noisy label learning research has made great progress on classification tasks through various methods, such as sample selection [7]- [10], weight generation [11], [12], transfer matrix [13]- [15], and semi-supervised methods [16]- [18]. Nevertheless, little effort has been devoted to the problem of noisy labels in deep metric learning (DML). ...

A Parametrical Model for Instance-Dependent Label Noise
  • Citing Article
  • August 2023

IEEE Transactions on Pattern Analysis and Machine Intelligence

... 14) Adversarial Training against Poisoning Attack: a) Adversarial Training against Backdoor Attack: Gao et al. [537] evaluate the effectiveness of adversarial training against backdoor attacks across various settings. They show that the type and budget of perturbations used in AT are crucial. ...

On the Effectiveness of Adversarial Training Against Backdoor Attacks
  • Citing Article
  • June 2023

IEEE Transactions on Neural Networks and Learning Systems

... For instance, the Alaskan Malamute can be visually similar to the Siberian Husky, hindering non-experts from accurately identifying the true breeds. Recently, this ambiguous supervision problem has attracted great attention from the community (Lyu, Wu, and Feng 2022;Lv et al. 2023;Yan and Guo 2023;Jia et al. 2024). ...

On the Robustness of Average Losses for Partial-Label Learning
  • Citing Article
  • May 2023

IEEE Transactions on Pattern Analysis and Machine Intelligence