Fangzhao Wu’s research while affiliated with Microsoft and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (212)


Measuring Human Contribution in AI-Assisted Content Generation
  • Preprint

August 2024

·

24 Reads

Yueqi Xie

·

Tao Qi

·

·

[...]

·

Fangzhao Wu

With the growing prevalence of generative artificial intelligence (AI), an increasing amount of content is no longer exclusively generated by humans but by generative AI models with human guidance. This shift presents notable challenges for the delineation of originality due to the varying degrees of human contribution in AI-assisted works. This study raises the research question of measuring human contribution in AI-assisted content generation and introduces a framework to address this question that is grounded in information theory. By calculating mutual information between human input and AI-assisted output relative to self-information of AI-assisted output, we quantify the proportional information contribution of humans in content generation. Our experimental results demonstrate that the proposed measure effectively discriminates between varying degrees of human contribution across multiple creative domains. We hope that this work lays a foundation for measuring human contributions in AI-assisted content generation in the era of generative AI.


The overall framework of Selective-FD
The federated distillation involves four iterative steps. First, each client trains a personalized model using its local private data. Second, each client predicts the label of the proxy samples based on the local model. Third, the server aggregates these local predictions and returns the ensemble predictions to clients. Fourth, clients update local models by knowledge distillation based on the ensemble predictions. During the training process, the client-side selectors and the server-side selector aim to filter out misleading and ambiguous knowledge from the local predictions. Some icons in this figure are from icons8.com.
Predict the label of a proxy sample on the client side (left) and the server side (right)
The prediction takes the form of a soft label (i.e., a logits vector), where each element represents the probability of the corresponding label. The predicted label is the element with the highest probability. We measure the quality of knowledge in federated distillation by accuracy and precision. The accurate prediction matches the ground truth label, while misleading knowledge does not. Meanwhile, precise knowledge has low entropy, while ambiguity implies high entropy and uncertainty. The client-side selectors are responsible for filtering out incorrect local predictions, while the server-side selector aims to eliminate ambiguous knowledge. The X-ray icon in this figure is from Chanut-is-Industries, Flaticon.com.
Visualization of non-IID data distribution
The horizontal axis indexes the proxy dataset and the local datasets, while the vertical axis indicates class labels. The size of scattered points denotes the number of samples.
Test accuracy of different methods on the pneumonia detection task
The error bar represents the mean ± standard deviation of five repetitions. The results show that the proposed Selective-FD method achieves the best performance, and the accuracy gain is more significant when using hard labels to transfer knowledge. Specifically, some baselines perform even worse than the independent learning scheme. These results demonstrate that knowledge sharing among clients can mislead and negatively influence local training. The NIH Chest X-ray Dataset⁵³ and RSNA-STR Pneumonia Detection Challenge image datasets⁵⁴ are used for model training and testing. NIH Chest X-ray Dataset is available at https://nihcc.app.box.com/v/ChestXray-NIHCC, which is provided by the NIH Clinical Center. The RSNA-STR Pneumonia Detection Challenge image datasets is available at https://www.rsna.org/education/ai-resources-and-training/ai-image-challenge/RSNA-Pneumonia-Detection-Challenge-2018.
The AUROC scores for incorrect prediction detection (left) and the test accuracy after federated distillation (right)
The error bar represents the mean ± standard deviation across 10 clients. The results show that both the AUROC score and the accuracy of our Selective-FD method are much higher than the baselines, indicating its effectiveness in identifying unknown classes from the proxy dataset. This results in a remarkable performance gain in federated distillation.

+3

Selective knowledge sharing for privacy-preserving federated distillation without a good teacher
  • Article
  • Full-text available

January 2024

·

67 Reads

·

16 Citations

While federated learning (FL) is promising for efficient collaborative learning without revealing local data, it remains vulnerable to white-box privacy attacks, suffers from high communication overhead, and struggles to adapt to heterogeneous models. Federated distillation (FD) emerges as an alternative paradigm to tackle these challenges, which transfers knowledge among clients instead of model parameters. Nevertheless, challenges arise due to variations in local data distributions and the absence of a well-trained teacher model, which leads to misleading and ambiguous knowledge sharing that significantly degrades model performance. To address these issues, this paper proposes a selective knowledge sharing mechanism for FD, termed Selective-FD, to identify accurate and precise knowledge from local and ensemble predictions, respectively. Empirical studies, backed by theoretical insights, demonstrate that our approach enhances the generalization capabilities of the FD framework and consistently outperforms baseline methods. We anticipate our study to enable a privacy-preserving, communication-efficient, and heterogeneity-adaptive federated training framework.

Download


Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models

December 2023

·

3 Reads

The integration of large language models (LLMs) with external content has enabled more up-to-date and wide-ranging applications of LLMs, such as Microsoft Copilot. However, this integration has also exposed LLMs to the risk of indirect prompt injection attacks, where an attacker can embed malicious instructions within external content, compromising LLM output and causing responses to deviate from user expectations. To investigate this important but underexplored issue, we introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to evaluate the risk of such attacks. Based on the evaluation, our work makes a key analysis of the underlying reason for the success of the attack, namely the inability of LLMs to distinguish between instructions and external content and the absence of LLMs' awareness to not execute instructions within external content. Building upon this analysis, we develop two black-box methods based on prompt learning and a white-box defense method based on fine-tuning with adversarial training accordingly. Experimental results demonstrate that black-box defenses are highly effective in mitigating these attacks, while the white-box defense reduces the attack success rate to near-zero levels. Overall, our work systematically investigates indirect prompt injection attacks by introducing a benchmark, analyzing the underlying reason for the success of the attack, and developing an initial set of defenses.


Defending ChatGPT against jailbreak attack via self-reminders

December 2023

·

565 Reads

·

107 Citations

Nature Machine Intelligence

ChatGPT is a societally impactful artificial intelligence tool with millions of users and integration into products such as Bing. However, the emergence of jailbreak attacks notably threatens its responsible and secure use. Jailbreak attacks use adversarial prompts to bypass ChatGPT’s ethics safeguards and engender harmful responses. This paper investigates the severe yet under-explored problems created by jailbreaks as well as potential defensive techniques. We introduce a jailbreak dataset with various types of jailbreak prompts and malicious instructions. We draw inspiration from the psychological concept of self-reminders and further propose a simple yet effective defence technique called system-mode self-reminder. This technique encapsulates the user’s query in a system prompt that reminds ChatGPT to respond responsibly. Experimental results demonstrate that self-reminders significantly reduce the success rate of jailbreak attacks against ChatGPT from 67.21% to 19.34%. Our work systematically documents the threats posed by jailbreak attacks, introduces and analyses a dataset for evaluating defensive interventions and proposes the psychologically inspired self-reminder technique that can efficiently and effectively mitigate against jailbreaks without further training.





Figure 1: Analysis of importance-rank changes of parameters in federated learning: (a) Averaged change in importance-ranks of parameters in benign local models after one training round with the standard deviation area shaded. (b) Comparison of change patterns under two different poisoning attack scenarios, where the disparity is measured by the difference in importance-rank changes between benign and poisoned models after one training round. (c) The disparity in change patterns of the untargeted attack under varying data heterogeneity determined by β.
Figure 2: Robustness test results against targeted attacks on CIFAR-10 with varying experimental settings. The results demonstrate that FedCPA consistently achieves the best defense performance (lowest ASR) compared with the baselines.
Figure 3: Qualitative analysis under a targeted attack scenario over TinyImagenet, where the highlighted part visualizes how the model recognizes class characteristics based on the Grad-CAM algorithm.
Figure 4: Comparison of change patterns over three datasets under two different poisoning attack scenarios, untargeted and targeted attack, where the disparity is measured by the difference in changes of importance rank between benign and poisoned models after one training round.
Towards Attack-tolerant Federated Learning via Critical Parameter Analysis

August 2023

·

27 Reads

Federated learning is used to train a shared model in a decentralized way without clients sharing private data with each other. Federated learning systems are susceptible to poisoning attacks when malicious clients send false updates to the central server. Existing defense strategies are ineffective under non-IID data settings. This paper proposes a new defense strategy, FedCPA (Federated learning with Critical Parameter Analysis). Our attack-tolerant aggregation method is based on the observation that benign local models have similar sets of top-k and bottom-k critical parameters, whereas poisoned local models do not. Experiments with different attack scenarios on multiple datasets demonstrate that our model outperforms existing defense strategies in defending against poisoning attacks.



Citations (46)


... The transferability of these attacks across different models, as shown by Zou et al. (2023), highlights the systemic nature of these vulnerabilities. Additionally, introduced reinforcement learning approaches for generating targeted adversarial prompts, while Yi et al. (2024) exposed how reverse alignment techniques can undermine LLM safety mechanisms. ...

Reference:

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines
On the Vulnerability of Safety Alignment in Open-Access LLMs

... The gradient reveals the information intensity during optimization. Thus, magnitude and gradient acts as parameter importance metrics to select target elements, which has incurred huge research interest in broad fields, such as network pruning [18,24,46,102], domain generalization [72,86,107], federated learning [63,82], and malicious defense [25,32,33,108,110]. Existing explorations focus on training a network from scratch and face no requirement to preserve the previously learned knowledge, thus entangling the magnitude and gradient information to select the crucial elements for the target task. ...

Towards Attack-tolerant Federated Learning via Critical Parameter Analysis
  • Citing Conference Paper
  • October 2023

... Presently, PPTs in FL have garnered significant attention due to the increasing focus on safeguarding sensitive information during collaborative model training. Several works [31][32][33][34][35] have explored diverse privacy-preserving (PP) mecha-nisms, aiming to strike a balance between model accuracy and individual privacy. For instance, Chen et al. [31] proposed an efficient PP and traceable FL framework with minimal overhead. ...

Selective knowledge sharing for privacy-preserving federated distillation without a good teacher

... For prompt detection, perplexity detection (PPL [7]) stands out, particularly for identifying adversarial suffixes. Prompt modification encompasses two main approaches: one perturbs the original prompt to disable potential adversarial suffixes (S-LM [41]), and the other achieves defense by appending carefully crafted suffixes (PAT [34], ICD [47], and SR [48]). Regarding model fine-tuning, using synthetic safety preference data for fine-tuning (CST [19]) and helping unlearn harmful knowledge (SafeUnlearn [54], SU for short) are the most representative approaches to enhancing defense capability. ...

Defending ChatGPT against jailbreak attack via self-reminders

Nature Machine Intelligence

... SELFIE [50] robustly selects potential unclean samples and gradually add them into the training process. DivideMix [27] integrates multiple techniques including Co-teaching, data augmentation method MixUp [74] and one semi-supervised learning [73] framework MixMatch [4]. ...

Non-IID always Bad? Semi-Supervised Heterogeneous Federated Learning with Local Knowledge Enhancement
  • Citing Conference Paper
  • October 2023

... In the literature on recommendation systems, sequential recommendation is a highly regarded research topic, aiming to predict the next item a user might like based on their interaction history (Wang et al., 2022a, b;Xie et al., 2023). Various methods have been proposed to address the task of sequential recommendation. ...

Rethinking Multi-Interest Learning for Candidate Matching in Recommender Systems
  • Citing Conference Paper
  • September 2023

... Moreover, the number of clients sampled from each stratum is re-allocated using an optimal Neyman allocation, which minimizes the variance introduced by client sampling and significantly improves convergence rates. Data-level sampling methods, such as FedSampling (Qi et al., 2023), aim to mimic centralized learning by sampling local data from each participating client, followed by only using this data in the local training. ...

FedSampling: A Better Sampling Strategy for Federated Learning
  • Citing Conference Paper
  • August 2023

... A different solution proposed analyzing the critical parameters of the local models to reliably identify malicious clients and use it for weighted updates aggregation [20]. An attack-tolerant FL method was also proposed, presenting local meta updates and global knowledge distillation to mitigate possible malicious clients effect on the global model [21]. ...

FedDefender: Client-Side Attack-Tolerant Federated Learning
  • Citing Conference Paper
  • August 2023

... Yu et al. (2022) propose Tiny-Newsrec, a self-supervised domain-specific posttraining method to address the domain shift problem from pre-training tasks to downstream news recommendation. DebiasGAN (Wu et al., 2022a) alleviates position bias via adversarial learning by modelling the personalized effect of position bias on click behavior to estimate unbiased click scores. DIGAT (Mao et al., 2022) is a dual-interactive graph attention network to model user and news graph channels. ...

DebiasGAN: Eliminating Position Bias in News Recommendation with Adversarial Learning
  • Citing Conference Paper
  • January 2022

... Recent studies [43] have explored the use of language models in recommendation systems, categorizing their roles within the recommendation pipeline: (1) feature engineering [3,5,11,35,50,55,69,80], (2) feature encoding [17,25,26,37,54,63,76,88,89,94], and (3) scoring/ranking functions [2,8,21,27,29,30,33,36,38,48,52,58,77,90,95,97]. ...

Tiny-NewsRec: Effective and Efficient PLM-based News Recommendation
  • Citing Conference Paper
  • January 2022