
Cho-Jui Hsieh- University of Texas at Austin
Cho-Jui Hsieh
- University of Texas at Austin
About
263
Publications
33,189
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
16,766
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (263)
Branch-and-bound (BaB) is among the most effective techniques for neural network (NN) verification. However, existing works on BaB for NN verification have mostly focused on NNs with piecewise linear activations, especially ReLU networks. In this paper, we develop a general framework, named GenBaB, to conduct BaB on general nonlinearities to verify...
Large Language Models (LLMs) have shown tremendous potential as agents, excelling at tasks that require multiple rounds of reasoning and interactions. Rejection Sampling Fine-Tuning (RFT) has emerged as an effective method for finetuning LLMs as agents: it first imitates expert-generated successful trajectories and further improves agentic skills t...
Recently DeepSeek R1 demonstrated how reinforcement learning with simple rule-based incentives can enable autonomous development of complex reasoning in large language models, characterized by the "aha moment", in which the model manifest self-reflection and increased response length during training. However, attempts to extend this success to mult...
In recent years, many neural network (NN) verifiers have been developed to formally verify certain properties of neural networks such as robustness. Although many benchmarks have been constructed to evaluate the performance of NN verifiers, they typically lack a ground-truth for hard instances where no current verifier can verify and no counterexam...
We study the problem of learning Lyapunov-stable neural controllers which provably satisfy the Lyapunov asymptotic stability condition within a region-of-attraction. Compared to previous works which commonly used counterexample guided training on this task, we develop a new and generally formulated certified training framework named CT-BaB, and we...
Pretrained Large Language Models (LLMs) require post-training methods such as supervised fine-tuning (SFT) on instruction-response pairs to enable instruction following. However, this process can potentially harm existing capabilities learned during pretraining. In this paper, we investigate the loss of context awareness after SFT, defined as the c...
Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements. However, current LoRA optimizers lack transformation invariance, meaning the actual updates to the weights depends on how the two LoRA factors are scaled or rotated. This deficiency leads to inefficient learning and sub-optimal...
Large Language Models (LLMs) have demonstrated remarkable proficiency in various natural language generation (NLG) tasks. Previous studies suggest that LLMs' generation process involves uncertainty. However, existing approaches to uncertainty estimation mainly focus on sequence-level uncertainty, overlooking individual pieces of information within...
Discrimination-aware classification methods remedy socioeconomic disparities exacerbated by machine learning systems. In this paper, we propose a novel data pre-processing technique that assigns weights to training instances in order to reduce discrimination without changing any of the inputs or labels. While the existing reweighing approach only l...
In the rapidly evolving landscape of artificial intelligence, generative models such as Generative Adversarial Networks (GANs) and Diffusion Models have become cornerstone technologies, driving innovation in diverse fields from art creation to healthcare. Despite their potential, these models face the significant challenge of data memorization, whi...
Large Language Models (LLMs) have demonstrated remarkable performance in solving math problems, a hallmark of human intelligence. Despite high success rates on current benchmarks; however, these often feature simple problems with only one or two unknowns, which do not sufficiently challenge their reasoning capacities. This paper introduces a novel...
Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also r...
This paper introduces the first gradient-based framework for prompt optimization in text-to-image diffusion models. We formulate prompt engineering as a discrete optimization problem over the language space. Two major challenges arise in efficiently finding a solution to this problem: (1) Enormous Domain Space: Setting the domain to the entire lang...
The trade-off between expressiveness and interpretability remains a core challenge when building human-centric predictive models for classification and decision-making. While symbolic rules offer interpretability, they often lack expressiveness, whereas neural networks excel in performance but are known for being black boxes. In this paper, we show...
Dataset Condensation has emerged as a technique for compressing large datasets into smaller synthetic counterparts, facilitating downstream training tasks. In this paper, we study the impact of bias inside the original dataset on the performance of dataset condensation. With a comprehensive empirical evaluation on canonical datasets with color, cor...
The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.%, demonstrating significant practical efficacy. Despite the widespread use of negative prompts, their intrinsic mechanisms remain largely unexplored. This paper presents the first com...
Diffusion models have achieved remarkable success in text-to-image generation tasks; however, the role of initial noise has been rarely explored. In this study, we identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images. Notably, these patches are ``univers...
Large Language Models (LLMs) require careful safety alignment to prevent malicious outputs. While significant research focuses on mitigating harmful content generation, the enhanced safety often come with the side effect of over-refusal, where the LLMs may reject innocuous prompts and become less helpful. Although the issue of over-refusal has been...
Branch-and-bound (BaB) is among the most effective methods for neural network (NN) verification. However, existing works on BaB have mostly focused on NNs with piecewise linear activations, especially ReLU networks. In this paper, we develop a general framework, named GenBaB, to conduct BaB for general nonlinearities in general computational graphs...
The prevalence and strong capability of large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. To prevent the potentially deceptive usage of LLMs, recent work has proposed algorithms to detect LLM-generated text and protect LLMs. In this paper, we investigate the robustness and reliability of thes...
Deep neural networks have performed remarkably in many areas, including image-related classification tasks. However, various studies have shown that they are vulnerable to adversarial examples – images carefully crafted to fool well-trained deep neural networks by introducing imperceptible perturbations to the original images. To better understand...
Trust region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appealing theoretical properties for nonconvex optimization by concurrently computing function value, gradient, and Hessian matrix to obtain the next search direction and the adjusted parameters. Although stochastic approximations help largely reduce the...
Artificial intelligence (AI) image translation has been a valuable tool for processing image data in biological and medical research. To apply such a tool in mission-critical applications, including drug screening, toxicity study, and clinical diagnostics, it is essential to ensure that the AI prediction is trustworthy. Here, we demonstrate that an...
Extreme multi-label classification aims to learn a classifier that annotates an instance with a relevant subset of labels from an extremely large label set. Many existing solutions embed the label matrix to a low-dimensional linear subspace, or examine the relevance of a test instance to every label via a linear scan. In practice, however, those ap...
The prevalence and high capacity of large language models (LLMs) present significant safety and ethical risks when malicious users exploit them for automated content generation. To prevent the potentially deceptive usage of LLMs, recent works have proposed several algorithms to detect machine-generated text. In this paper, we systematically test th...
We introduce a novel class of sample-based explanations we term high-dimensional representers, that can be used to explain the predictions of a regularized high-dimensional model in terms of importance weights for each of the training samples. Our workhorse is a novel representer theorem for general regularized high-dimensional models, which decomp...
Despite the demonstrated empirical efficacy of prompt tuning to adapt a pretrained language model for a new task, the theoretical underpinnings of the difference between "tuning parameters before the input" against "the tuning of model weights" are limited. We thus take one of the first steps to understand the role of soft-prompt tuning for transfo...
Lipschitz bandit is a variant of stochastic bandits that deals with a continuous arm set defined on a metric space, where the reward function is subject to a Lipschitz constraint. In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions where an adaptive adversary corrupts the stochastic rewards up t...
The eXtreme Multi-label Classification~(XMC) problem seeks to find relevant labels from an exceptionally large label space. Most of the existing XMC learners focus on the extraction of semantic features from input query text. However, conventional XMC studies usually neglect the side information of instances and labels, which can be of use in many...
In this paper, we define, evaluate, and improve the ``relay-generalization'' performance of reinforcement learning (RL) agents on the out-of-distribution ``controllable'' states. Ideally, an RL agent that generally masters a task should reach its goal starting from any controllable state of the environment instead of memorizing a small set of traje...
Most of existing video-language pre-training methods focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information, which is of importance to downstream tasks requiring temporal localization and semantic reasoning. In this work, we propose a simple yet effective vi...
In stochastic contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience to minimize the cumulative regret. Like many other machine learning algorithms, the performance of bandits heavily depends on the values of hyperparameters, and theoretically derived parameter values may lead to unsatisfac...
Effective robustness'' measures the extra out-of-distribution (OOD) robustness beyond what can be predicted from the in-distribution (ID) performance. Existing effective robustness evaluations typically use a single test set such as ImageNet to evaluate ID accuracy. This becomes problematic when evaluating models trained on different data distribut...
Mesenchymal stromal cells (MSCs) offer promising potential in biomedical research, clinical therapeutics, and immunomodulatory therapies due to their ease of isolation and multipotent, immunoprivileged, and immunosuppersive properties. Extensive efforts have focused on optimizing the cell isolation and culture methods to generate scalable, therapeu...
Physical adversarial attacks refer to the realization of a physical adversarial object that evades the prediction of a machine learning model. This chapter summarizes principles to generate such adversarial objects and introduces several realizations of such physical adversarial examples.
Although computing the exact robustness radius via complete verification is difficult, recent progress has enabled complete verification for realistic neural networks on medium-scale datasets. In this chapter, we discuss the recent progress in complete verification.
In this chapter, we discuss the robustness verification problem for other nonneural network models, including tree-based models and nearest-neighbor classifiers. Due to the discrete nature of these models, their robustness evaluation methods are very different from neural networks. We discuss how to evaluate and enhance the robustness of these mode...
This chapter starts by introducing the taxonomy of evasion attacks based on the knowledge of an attacker. Then this chapter provides a deep dive into three black-box attack settings: soft-label, hard-label, and transfer attacks.
In this chapter, we discuss several methods for detecting adversarial inputs or compromised models against training-phase and deployment-phase attacks.
Besides adversarial training, another successful defense approach is to inject randomness into the model. In this chapter, we introduce this family of randomization-based defense methods.
In this chapter, we study how adversarial robustness can be transferred in two popular machine learning paradigms, meta-learning (metamodel training and few-shot adaptation) and contrastive learning (pretraining and fine-tuning).
This chapter introduces two types of training-phase adversarial attacks, poisoning attack and backdoor attack. Then we provide a case study on the robustness of federated learning algorithms against distributed backdoor attacks.
In this chapter, we introduce how the techniques used in adversarial attacks, including backdoors and adversarial examples, can be leveraged for model watermarking and fingerprinting, which are crucial to model ownership verification and model governance.
Model reprogramming can be viewed as a novel way of data-efficient transfer learning without changing the pretrained model parameters from the source domain. The methodology was originally inspired by adversarial machine learning due to its secret nature in repurposing a trained model to perform a new task that is not designed by the model develope...
In this chapter, we introduce the procedure of adversarial attacks and describe how to conduct an adversarial attack in the white-box setting, where the attack is assumed to have full knowledge about the target model. In this setting, we show that adversarial attacks can be formulated as optimization problems and discuss a widely used gradient-base...
In this chapter, we discuss certified robust training methods based on incomplete verification methods.
This chapter introduces mathematical notations and machine learning basics, and provides examples to motivate the study of adversarial robustness for machine learning. Finally, we provide some links to open-source Python-based libraries for adversarial robustness.
In this chapter, we show how the techniques for generating adversarial examples can be used to generate a type of post-hoc local explanations for a given classifier, known as “contrastive explanations”. The technique applies to both white-box and black-box models, similarly to assumed knowledge of a target model in adversarial attacks.
In this chapter, we introduce incomplete neural network verification methods. Instead of computing the exact robustness score (shortest distance from a sample to the decision boundary), they aim to find a lower bound of it using efficient bound propagation algorithms. We introduce the convex relaxation framework, which serves as a foundation of alm...
Adversarial defense methods have been developed to improve the robustness of neural networks. In this chapter, we discuss the challenges of developing and evaluating a defense method and describe the difference between empirical defense and certified defense. We also briefly introduce several representative defense techniques by categorizing them i...
This chapter provides some examples and the rule of thumb for conducting adversarial attacks on tasks other than classification and data modalities other than images.
Adversarial training is one of the most popular methods to improve the adversarial robustness of neural network models. As the goal of adversarial defense is training a model to minimize the robust error against worst-case perturbations within a certain distance, this training objective can be naturally written as a bilevel optimization, where an a...
Current studies on adversarial examples focus on supervised learning tasks, relying on the ground-truth data label, a targeted objective, or supervision from a trained classifier. In this chapter, we introduce a framework of generating adversarial examples for unsupervised models and demonstrate novel applications to data augmentation. When using t...
Adversarial attacks, despite being an effective tool to find “bugs” in machine learning models, do not provide a reliable and provable measurement of robustness. For applications such as self-driving cars and flight control systems, 100% safety has to be guaranteed. Neural network verification aims to provide provable safety and robustness guarante...
This chapter focuses on how to extend the robustness verification tools for ℓp norms to a set of semantic perturbations for images, such as semantic changes in hue, saturation, lightness, brightness, contrast, and rotation.
Dataset distillation methods aim to compress a large dataset into a small set of synthetic samples, such that when being trained on, competitive performances can be achieved compared to regular training on the entire dataset. Among recently proposed methods, Matching Training Trajectories (MTT) achieves state-of-the-art performance on CIFAR-10/100,...
The success of AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin. Given that the state space of Go is extremely large and a human player can play the game from any legal state, we ask whether adversarial states exist for Go AIs that may lead them to play surprisingly wrong actions. In t...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-shot learning without changing model parameters. However, as we show, fine-tuning an LLM on any specific task generally destroys its in-context ability. We discover an important cause of this loss, format specialization, where the model overfits to t...
Recent decades have witnessed great advances of deep learning in tackling various problems such as classification and decision making. The rapid development gave rise to a novel framework, Learning-to-Learn (L2L), in which an automatic optimization algorithm (optimizer) modeled by neural networks is expected to learn rules for updating the target o...
In this paper, we study the robustness of graph convolutional networks (GCNs). Previous works have shown that GCNs are vulnerable to adversarial perturbation on adjacency or feature matrices of existing nodes; however, such attacks are usually unrealistic in real applications. For instance, in social network applications, the attacker will need to...
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks and has drawn increasing attention from the Natural Language Processing (NLP) community. Despite the surge of new AED methods, our studies show that existing methods heavily rely on a shortcut to achieve good performance. In other words, current search-b...
Generative adversarial network (GAN) continues to be a popular research direction due to its high generation quality. It is observed that many state-of-the-art GANs generate samples that are more similar to the training set than a holdout testing set from the same distribution, hinting some training samples are implicitly memorized in these models....
Uncertainty quantification is one of the most crucial tasks to obtain trustworthy and reliable machine learning models for decision making. However, most research in this domain has only focused on problems with small label spaces and ignored eXtreme Multi-label Classification (XMC), which is an essential task in the era of big data for web-scale m...
Extreme multi-label classification (XMC) is a popular framework for solving many real-world problems that require accurate prediction from a very large number of potential output choices. A popular approach for dealing with the large label space is to arrange the labels into a shallow tree-based index and then learn an ML model to efficiently searc...
Large pre-trained language models (PLMs) have proven to be a crucial component of modern natural language processing systems. PLMs typically need to be fine-tuned on task-specific downstream datasets, which makes it hard to claim the ownership of PLMs and protect the developer's intellectual property due to the catastrophic forgetting phenomenon. W...
Lipschitz constants are connected to many properties of neural networks, such as robustness, fairness, and generalization. Existing methods for computing Lipschitz constants either produce relatively loose upper bounds or are limited to small networks. In this paper, we develop an efficient framework for computing the $\ell_\infty$ local Lipschitz...
Visual 2.5D perception involves understanding the semantics and geometry of a scene through reasoning about object relationships with respect to the viewer. However, existing works in visual recognition primarily focus on the semantics. To bridge this gap, we study 2.5D visual relationship detection (2.5VRD), in which the goal is to jointly detect...
Concept-based interpretations of black-box models are often more intuitive for humans to understand. The most widely adopted approach for concept-based interpretation is Concept Activation Vector (CAV). CAV relies on learning a linear relation between some latent representation of a given model and concepts. The linear separability is usually impli...
Bound propagation methods, when combined with branch and bound, are among the most effective methods to formally verify properties of deep neural networks such as correctness, robustness, and safety. However, existing works cannot handle the general form of cutting plane constraints widely accepted in traditional solvers, which are crucial for stre...
The mechanical properties of tissues have profound impacts on a wide range of biological processes such as embryo development (1, 2), wound healing (3–6), and disease progression (7). Specifically, the spatially varying moduli of cells largely influence the local tissue deformation and intercellular interaction. Despite the importance of characteri...
Federated learning~(FL) has recently attracted increasing attention from academia and industry, with the ultimate goal of achieving collaborative training under privacy and communication constraints. Existing iterative model averaging based FL algorithms require a large number of communication rounds to obtain a well-performed model due to extremel...
Dataset Condensation is a newly emerging technique aiming at learning a tiny dataset that captures the rich information encoded in the original dataset. As the size of datasets contemporary machine learning models rely on becomes increasingly large, condensation methods become a prominent direction for accelerating network training and reducing dat...
Adversarial training has become one of the most effective methods for improving robustness of neural networks. However, it often suffers from poor generalization on both clean and perturbed data. Current robust training method always use a uniformed perturbation strength for every samples to generate adversarial examples during model training for i...
Existing studies have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive, but can be easily manipulated by adversaries to fool NLP models. In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by u...
Efficient performance estimation of architectures drawn from large search spaces is essential to Neural Architecture Search. One-Shot methods tackle this challenge by training one supernet to approximate the performance of every architecture in the search space via weight-sharing, thereby drastically reducing the search cost. However, due to couple...
Interval Bound Propagation (IBP) is so far the base of state-of-the-art methods for training neural networks with certifiable robustness guarantees when potential adversarial perturbations present, while the convergence of IBP training remains unknown in existing literature. In this paper, we present a theoretical analysis on the convergence of IBP...
Recently, Sharpness-Aware Minimization (SAM), which connects the geometry of the loss landscape and generalization, has demonstrated significant performance boosts on training large-scale models such as vision transformers. However, the update rule of SAM requires two sequential (non-parallelizable) gradient computations at each step, which can dou...