Chaowei Xiao’s research while affiliated with University of Wisconsin–Madison and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (130)


OET: Optimization-based prompt injection Evaluation Toolkit
  • Preprint

May 2025

Jinsheng Pan

·

Xiaogeng Liu

·

Chaowei Xiao

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation, enabling their widespread adoption across various domains. However, their susceptibility to prompt injection attacks poses significant security risks, as adversarial inputs can manipulate model behavior and override intended instructions. Despite numerous defense strategies, a standardized framework to rigorously evaluate their effectiveness, especially under adaptive adversarial scenarios, is lacking. To address this gap, we introduce OET, an optimization-based evaluation toolkit that systematically benchmarks prompt injection attacks and defenses across diverse datasets using an adaptive testing framework. Our toolkit features a modular workflow that facilitates adversarial string generation, dynamic attack execution, and comprehensive result analysis, offering a unified platform for assessing adversarial robustness. Crucially, the adaptive testing framework leverages optimization methods with both white-box and black-box access to generate worst-case adversarial examples, thereby enabling strict red-teaming evaluations. Extensive experiments underscore the limitations of current defense mechanisms, with some models remaining susceptible even after implementing security enhancements.


Doxing via the Lens: Revealing Privacy Leakage in Image Geolocation for Agentic Multi-Modal Large Reasoning Model

April 2025

·

4 Reads

Weidi Luo

·

Qiming Zhang

·

Tianyu Lu

·

[...]

·

Chaowei Xiao

The increasing capabilities of agentic multi-modal large reasoning models, such as ChatGPT o3, have raised critical concerns regarding privacy leakage through inadvertent image geolocation. In this paper, we conduct the first systematic and controlled study on the potential privacy risks associated with visual reasoning abilities of ChatGPT o3. We manually collect and construct a dataset comprising 50 real-world images that feature individuals alongside privacy-relevant environmental elements, capturing realistic and sensitive scenarios for analysis. Our experimental evaluation reveals that ChatGPT o3 can predict user locations with high precision, achieving street-level accuracy (within one mile) in 60% of cases. Through analysis, we identify key visual cues, including street layout and front yard design, that significantly contribute to the model inference success. Additionally, targeted occlusion experiments demonstrate that masking critical features effectively mitigates geolocation accuracy, providing insights into potential defense mechanisms. Our findings highlight an urgent need for privacy-aware development for agentic multi-modal large reasoning models, particularly in applications involving private imagery.


Pipeline for ProteinDT pretraining framework and downstream tasks
a, A contrastive learning module. ProteinCLAP aligns the representation space of the text and protein sequence modalities. b, A facilitator module. ProteinFacilitator augments the mapping from text-sequence representation to protein-sequence representation. c, A protein-sequence decoder module. It generates protein sequences conditioned on the representations produced from previous steps. d, Downstream text-to-protein generation task. e, Downstream text-guided protein editing task. f, Downstream protein property prediction task.
Visualization of text-to-protein generation and text-guided protein editing
a, Visualization for evaluation of text-to-protein generation. The pretrained ProteinCLAP is used to calculate the similarity between the sampled text and generated protein sequence pairs. b,c, Two methods for text-guided protein editing: latent interpolation (b) and latent optimization (c). SLERP, spherical linear interpolation. d,e, Visualization for evaluation of text-guided protein editing: structure, stability and region (d) and peptide binding (e). For four types of editing task, different evaluation metrics (marked in red) are applied accordingly.
Visual analysis on text-guided protein editing with latent optimization
a–c, Visualization of structure editing on α-helices: the input protein (a), the optimized protein with more α-helices (b) and the optimized protein with less α-helices (c). d–f, Visualization of structure editing on β-sheets: the input protein (d), the optimized protein with more β-sheets (e) and the optimized protein with less β-sheets (f). g–i, Visualization of peptide-binding editing on Protein Data Bank 3IQI: the input protein (g), the optimized protein with higher binding affinity (h) and the optimized protein with lower binding affinity (i).
A text-guided protein design framework
  • Article
  • Publisher preview available

March 2025

·

19 Reads

·

6 Citations

Nature Machine Intelligence

Current AI-assisted protein design utilizes mainly protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in text format describing proteins’ high-level functionalities, yet whether the incorporation of such text data can help in protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multimodal framework that leverages textual descriptions for protein design. ProteinDT consists of three consecutive steps: ProteinCLAP, which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441,000 text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks.

View access options

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

February 2025

·

155 Reads

Generative Foundation Models (GenFMs) have emerged as transformative tools. However, their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions. First, we systematically review global AI governance laws and policies from governments and regulatory bodies, as well as industry practices and standards. Based on this analysis, we propose a set of guiding principles for GenFMs, developed through extensive multidisciplinary collaboration that integrates technical, ethical, legal, and societal perspectives. Second, we introduce TrustGen, the first dynamic benchmarking platform designed to evaluate trustworthiness across multiple dimensions and model types, including text-to-image, large language, and vision-language models. TrustGen leverages modular components--metadata curation, test case generation, and contextual variation--to enable adaptive and iterative assessments, overcoming the limitations of static evaluation methods. Using TrustGen, we reveal significant progress in trustworthiness while identifying persistent challenges. Finally, we provide an in-depth discussion of the challenges and future directions for trustworthy GenFMs, which reveals the complex, evolving nature of trustworthiness, highlighting the nuanced trade-offs between utility and trustworthiness, and consideration for various downstream applications, identifying persistent challenges and providing a strategic roadmap for future research. This work establishes a holistic framework for advancing trustworthiness in GenAI, paving the way for safer and more responsible integration of GenFMs into critical applications. To facilitate advancement in the community, we release the toolkit for dynamic evaluation.


AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection

February 2025

·

1 Citation

The rapid advancements in Large Language Models (LLMs) have enabled their deployment as autonomous agents for handling complex tasks in dynamic environments. These LLMs demonstrate strong problem-solving capabilities and adaptability to multifaceted scenarios. However, their use as agents also introduces significant risks, including task-specific risks, which are identified by the agent administrator based on the specific task requirements and constraints, and systemic risks, which stem from vulnerabilities in their design or interactions, potentially compromising confidentiality, integrity, or availability (CIA) of information and triggering security risks. Existing defense agencies fail to adaptively and effectively mitigate these risks. In this paper, we propose AGrail, a lifelong agent guardrail to enhance LLM agent safety, which features adaptive safety check generation, effective safety check optimization, and tool compatibility and flexibility. Extensive experiments demonstrate that AGrail not only achieves strong performance against task-specific and system risks but also exhibits transferability across different LLM agents' tasks.


Figure 1: An example of the influence of the visual modality on safety alignment of LLaVA. The incorporation of the vision module undermines the safety mechanism of the language module.
Figure 2: Visialization of LLaVA's hidden states under 2-dimensional PCA. We plot harmful/harmless queries with/without the blank image. Harmful and harmless queries without an image can be largely distinguished while the difference is blurred with blank image.
Figure 3: Visialization of hidden states by LLaVA with VLM-GUARD under 2-dimensional PCA. The distinction between harmful and harmless queries is maintained and even strengthened after applying VLM-GUARD.
Security and quality performance of vanilla LLaVA and with different safety alignment mechanisms. Lower ASR and lower PPL signifies a safer and natural model. The lowest ASR and PPL are marked in bold.
VLM-Guard: Safeguarding Vision-Language Models via Fulfilling Safety Alignment Gap

February 2025

·

30 Reads

The emergence of vision language models (VLMs) comes with increased safety concerns, as the incorporation of multiple modalities heightens vulnerability to attacks. Although VLMs can be built upon LLMs that have textual safety alignment, it is easily undermined when the vision modality is integrated. We attribute this safety challenge to the modality gap, a separation of image and text in the shared representation space, which blurs the distinction between harmful and harmless queries that is evident in LLMs but weakened in VLMs. To avoid safety decay and fulfill the safety alignment gap, we propose VLM-Guard, an inference-time intervention strategy that leverages the LLM component of a VLM as supervision for the safety alignment of the VLM. VLM-Guard projects the representations of VLM into the subspace that is orthogonal to the safety steering direction that is extracted from the safety-aligned LLM. Experimental results on three malicious instruction settings show the effectiveness of VLM-Guard in safeguarding VLM and fulfilling the safety alignment gap between VLM and its LLM component.


Position: TRUSTLLM: Trustworthiness in Large Language Models

January 2025

·

103 Reads

·

54 Citations

Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustwor-thiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TRUSTLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, * Major contribution , § Corresponding author


Fig. 1. An overview of DreamDrive. Given an input image, our method can generate a 4D spatio-temporal driving scene, where we can render 3D-consistent dynamic driving videos with any driving trajectories.
DreamDrive: Generative 4D Scene Modeling from Street View Images

December 2024

·

27 Reads

Synthesizing photo-realistic visual observations from an ego vehicle's driving trajectory is a critical step towards scalable training of self-driving models. Reconstruction-based methods create 3D scenes from driving logs and synthesize geometry-consistent driving videos through neural rendering, but their dependence on costly object annotations limits their ability to generalize to in-the-wild driving scenarios. On the other hand, generative models can synthesize action-conditioned driving videos in a more generalizable way but often struggle with maintaining 3D visual consistency. In this paper, we present DreamDrive, a 4D spatial-temporal scene generation approach that combines the merits of generation and reconstruction, to synthesize generalizable 4D driving scenes and dynamic driving videos with 3D consistency. Specifically, we leverage the generative power of video diffusion models to synthesize a sequence of visual references and further elevate them to 4D with a novel hybrid Gaussian representation. Given a driving trajectory, we then render 3D-consistent driving videos via Gaussian splatting. The use of generative priors allows our method to produce high-quality 4D scenes from in-the-wild driving data, while neural rendering ensures 3D-consistent video generation from the 4D scenes. Extensive experiments on nuScenes and street view images demonstrate that DreamDrive can generate controllable and generalizable 4D driving scenes, synthesize novel views of driving videos with high fidelity and 3D consistency, decompose static and dynamic elements in a self-supervised manner, and enhance perception and planning tasks for autonomous driving.




Citations (50)


... Leveraging recent advancements in transformer-based architectures has resulted in promising performance. For instance, leveraging contrastive learning and generative models, Liu et al. [208] proposed ProteinDT for text-guided protein generation to property prediction. Trained on a large-scale dataset (SwissProtCLAP, with 441K text and protein pairs), it outperformed SOTA methods in protein property prediction benchmarks [209]- [211]. ...

Reference:

A Comprehensive Survey of Foundation Models in Medicine
A text-guided protein design framework

Nature Machine Intelligence

... MetaAgents, while providing a new possibility for LLM-based agents, also raises ethical concerns. The first concern is the trustworthiness issues, such as fairness, transparency, and accountability [26]. LLMs may generate undesired output, such as gender stereotypes and harmful opinions, which may be amplified through interactions in multi-agent settings. ...

Position: TRUSTLLM: Trustworthiness in Large Language Models

... This is largely based on parameter-efficient finetuning through low-rank adaptation (LoRA) (Hu et al., 2021) and DPO on its weights in decoder mode. DPO post-training has been shown to improve performance in related downstream prediction tasks using relatively little preference data, even in high-performance computing (HPC) applications (Dharuman et al., 2024). Additionally, we test parsing performance under simulated image degradation to mimic low-quality scans. ...

MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference Optimization

... Training-time methods primarily focus on optimizing the model architecture (Liu et al., 2024b) and loss functions (Chakraborty et al., 2024), or constructing safety instruction data to encompass a broader range of risks to boost the model through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) (Zhang et al., 2024e;Chen et al., 2023;Zong et al., 2024;Li et al., 2024c). Inference-time methods typically employ additional detectors to discriminate risk in the model input or output, regenerating safe responses if the initial replies are deemed unsafe (Gou et al., 2024;Wang et al., 2024d;Zhang et al., 2023;Pi et al., 2024;. Although these methods demonstrate progress, they often assume that the model inherently possesses the ability to identify complex risks during training or inference. ...

AdaShield : Safeguarding Multimodal Large Language Models from Structure-Based Attack via Adaptive Shield Prompting
  • Citing Chapter
  • November 2024

... Toward meeting the growing needs of real-world applications, vision-enabled large language models (VLLMs), which process both visual and textual inputs, have become remarkably proficient at a wide range of tasks like visual question-answering, reasoning, and zero-shot classification (Liu et al., 2024a;Ma et al., 2023b). Compared with single-modality models, the space of possible attacks on VLLMs is significantly larger: beyond the fact that attackers can potentially manipulate both inputs Qi et al., 2023), the safe deployment of a VLLM for many tasks-e.g., autonomous vehicle stacks or military robotics, among many others-requires robust processing and interpretation of visual information (Eykholt et al., 2018;Julian et al., 2020). ...

Dolphins: Multimodal Language Model for Driving
  • Citing Chapter
  • November 2024

... They generate samples by iteratively removing noise and refining predictions, starting from pure Gaussian noise. With a focus on diversity, generative models are also used to go beyond motion prediction to motion generation [641,510,131,511], a valuable tool for simulation and data generation. ...

RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
  • Citing Chapter
  • October 2024

... From a data-centric perspective, data poisoning attacks pose a significant threat to the integrity and reliability of LLMs by corrupting the training datasets [131,132] In the RLHF stage, the integrity of the model's learning process can be compromised through the poisoning of reward models [1,137,138,139,140,141]. A critical example is the RankPoison attack introduced by [142], which manipulates reward signals by strategically corrupting human preference datasets. Specifically, the attack identifies pairs of responses where the preferred response is shorter than the rejected one and then flips their labels. ...

RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models

... In contrast, OOD generalization addresses heterogeneity arising from distribution shifts between training and testing data, emphasizing performance on diverse, unseen distributions to test broader generalization capabilities. Therefore, unlike prior FL research [5,64,51] focused on in-distribution generalization by evaluating client performance within training environments, OOD generalization in FL requires methods that address distribution shifts both among clients and between training and testing data to ensure robust performance. ...

Perada: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees
  • Citing Conference Paper
  • June 2024

... Although RLHF has been applied primarily in domains such as Large Language Models (LLMs) Chaudhari et al. 2024;Zeng et al. 2024;Zhou et al. 2024), robotics control (Hiranaka et al. 2023, and traffic control (Cao et al. 2024), it also holds significant potential for UI adaptation. While traditional machine learning methods provide a foundation for UI adaptation, RL-especially when augmented with human feedback-offers a more dynamic and responsive framework for developing AUIs. ...

Reinforcement Learning with Human Feedback for Realistic Traffic Simulation
  • Citing Conference Paper
  • May 2024