Aleksander Madry’s research while affiliated with Massachusetts Institute of Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (112)


Figure 21: Performance of our three compute-optimal MPT models [Mos23b; HBM+22].
Compute requirement for attributing our different MPT models [Mos23b].
Hyperparameters used to train our RN-k models. We leverage the µP framework [YHB+22] in order to use the same hyperparameters for all our models of different sizes.
The architecture and hyperparameters of our three MPT models [Mos23b].
Small-to-Large Generalization: Data Influences Models Consistently Across Scale
  • Preprint
  • File available

May 2025

Alaa Khaddaj

·

Logan Engstrom

·

Aleksander Madry

Choice of training data distribution greatly influences model behavior. Yet, in large-scale settings, precisely characterizing how changes in training data affects predictions is often difficult due to model training costs. Current practice is to instead extrapolate from scaled down, inexpensive-to-train proxy models. However, changes in data do not influence smaller and larger models identically. Therefore, understanding how choice of data affects large-scale models raises the question: how does training data distribution influence model behavior across compute scale? We find that small- and large-scale language model predictions (generally) do highly correlate across choice of training data. Equipped with these findings, we characterize how proxy scale affects effectiveness in two downstream proxy model applications: data attribution and dataset selection.

Download

Figure 2: Comparing true rollout success (M) vs. proxy metric ( M)
Figure 9: Avg. success on MetaWorld with nogoal conditioning.
Task-wise experimental setup for selecting data and, training and evaluating policies
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels

May 2025

·

3 Reads

Shivin Dass

·

Alaa Khaddaj

·

Logan Engstrom

·

[...]

·

Recently, the robotics community has amassed ever larger and more diverse datasets to train generalist robot policies. However, while these policies achieve strong mean performance across a variety of tasks, they often underperform on individual, specialized tasks and require further tuning on newly acquired task-specific data. Combining task-specific data with carefully curated subsets of large prior datasets via co-training can produce better specialized policies, but selecting data naively may actually harm downstream performance. To address this, we introduce DataMIL, a policy-driven data selection framework built on the datamodels paradigm that reasons about data selection in an end-to-end manner, using the policy itself to identify which data points will most improve performance. Unlike standard practices that filter data using human notions of quality (e.g., based on semantic or visual similarity), DataMIL directly optimizes data selection for task success, allowing us to select data that enhance the policy while dropping data that degrade it. To avoid performing expensive rollouts in the environment during selection, we use a novel surrogate loss function on task-specific data, allowing us to use DataMIL in the real world without degrading performance. We validate our approach on a suite of more than 60 simulation and real-world manipulation tasks - most notably showing successful data selection from the Open X-Embodiment datasets-demonstrating consistent gains in success rates and superior performance over multiple baselines. Our results underscore the importance of end-to-end, performance-aware data selection for unlocking the potential of large prior datasets in robotics. More information at https://robin-lab.cs.utexas.edu/datamodels4imitation/


AI Supply Chains: An Emerging Ecosystem of AI Actors, Products, and Services

April 2025

·

20 Reads

The widespread adoption of AI in recent years has led to the emergence of AI supply chains: complex networks of AI actors contributing models, datasets, and more to the development of AI products and services. AI supply chains have many implications yet are poorly understood. In this work, we take a first step toward a formal study of AI supply chains and their implications, providing two illustrative case studies indicating that both AI development and regulation are complicated in the presence of supply chains. We begin by presenting a brief historical perspective on AI supply chains, discussing how their rise reflects a longstanding shift towards specialization and outsourcing that signals the healthy growth of the AI industry. We then model AI supply chains as directed graphs and demonstrate the power of this abstraction by connecting examples of AI issues to graph properties. Finally, we examine two case studies in detail, providing theoretical and empirical results in both. In the first, we show that information passing (specifically, of explanations) along the AI supply chains is imperfect, which can result in misunderstandings that have real-world implications. In the second, we show that upstream design choices (e.g., by base model providers) have downstream consequences (e.g., on AI products fine-tuned on the base model). Together, our findings motivate further study of AI supply chains and their increasingly salient social, economic, regulatory, and technical implications.


Learning to Attribute with Attention

April 2025

·

2 Reads

Given a sequence of tokens generated by a language model, we may want to identify the preceding tokens that influence the model to generate this sequence. Performing such token attribution is expensive; a common approach is to ablate preceding tokens and directly measure their effects. To reduce the cost of token attribution, we revisit attention weights as a heuristic for how a language model uses previous tokens. Naive approaches to attribute model behavior with attention (e.g., averaging attention weights across attention heads to estimate a token's influence) have been found to be unreliable. To attain faithful attributions, we propose treating the attention weights of different attention heads as features. This way, we can learn how to effectively leverage attention weights for attribution (using signal from ablations). Our resulting method, Attribution with Attention (AT2), reliably performs on par with approaches that involve many ablations, while being significantly more efficient. To showcase the utility of AT2, we use it to prune less important parts of a provided context in a question answering setting, improving answer quality. We provide code for AT2 at https://github.com/MadryLab/AT2 .


Optimizing ML Training with Metagradient Descent

March 2025

·

2 Reads

A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients. With metagradient descent (MGD), we greatly improve on existing dataset selection methods, outperform accuracy-degrading data poisoning attacks by an order of magnitude, and automatically find competitive learning rate schedules.


Figure D.1: Examples of mislabeled or poorly written questions from mathematics benchmarks.
Figure D.2: Examples of mislabeled or poorly written questions from reading comprehension benchmarks.
Figure D.3: Examples of ambiguous questions from VQA V2.0.
Do Large Language Model Benchmarks Test Reliability?

February 2025

·

17 Reads

When deploying large language models (LLMs), it is important to ensure that these models are not only capable, but also reliable. Many benchmarks have been created to track LLMs' growing capabilities, however there has been no similar focus on measuring their reliability. To understand the potential ramifications of this gap, we investigate how well current benchmarks quantify model reliability. We find that pervasive label errors can compromise these evaluations, obscuring lingering model failures and hiding unreliable behavior. Motivated by this gap in the evaluation of reliability, we then propose the concept of so-called platinum benchmarks, i.e., benchmarks carefully curated to minimize label errors and ambiguity. As a first attempt at constructing such benchmarks, we revise examples from fifteen existing popular benchmarks. We evaluate a wide range of models on these platinum benchmarks and find that, indeed, frontier LLMs still exhibit failures on simple tasks such as elementary-level math word problems. Analyzing these failures further reveals previously unidentified patterns of problems on which frontier models consistently struggle. We provide code at https://github.com/MadryLab/platinum-benchmarks


FIG. 3. Performance discriminating between real and decoy structures. (a) RLA scores correlate with TM scores for single-chain structure predictions approximately as well as AlphaFold2 confidence scores (AF2Rank). (b) The highest-ranked decoy based on RLA score has a similar TM score to the highest-ranked decoy according to AF2Rank. (c) Approximate compute times per complex for each method used in the design tests. RLA scores discriminate between bad (DockQ < 0.23) and good (DockQ >= 0.23) decoy protein-protein (d) and proteinpeptide (f) complexes better than Rosetta and AF2 Initial Guess, even without peptide sequence information. RLA scores correlate with decoy protein-protein (e) and protein-peptide (g) complex DockQ scores better than Rosetta and AF2 Initial Guess. Data shown are mean ± SEM over different targets.
Jointly Embedding Protein Structures and Sequences through Residue Level Alignment

November 2024

·

32 Reads

·

2 Citations

PRX Life

The relationships between protein sequences, structures, and functions are determined by complex codes that scientists aim to decipher. While structures contain key information about proteins' biochemical functions, they are often experimentally difficult to obtain. In contrast, protein sequences are abundant but are a step removed from function. In this paper, we propose residue level alignment (RLA)—a self-supervised objective for aligning sequence and structure embedding spaces. By situating sequence and structure encoders within the same latent space, RLA enriches the sequence encoder with spatial information. Moreover, our framework enables us to measure the similarity between a sequence and structure by comparing their RLA embeddings. We show how RLA similarity scores can be used for binder design by selecting true binders from sets of designed binders. RLA scores are informative even when they are calculated given only the backbone structure of the binder and no binder sequence information, which simulates the information available in many early-stage binder design libraries. RLA performs similarly to benchmark methods and is orders of magnitude faster, making it a valuable new screening tool for binder design pipelines. Published by the American Physical Society 2024


Attribute-to-Delete: Machine Unlearning via Datamodel Matching

October 2024

·

5 Reads

Machine unlearning -- efficiently removing the effect of a small "forget set" of training data on a pre-trained machine learning model -- has recently attracted significant research interest. Despite this interest, however, recent work shows that existing machine unlearning techniques do not hold up to thorough evaluation in non-convex settings. In this work, we introduce a new machine unlearning technique that exhibits strong empirical performance even in such challenging settings. Our starting point is the perspective that the goal of unlearning is to produce a model whose outputs are statistically indistinguishable from those of a model re-trained on all but the forget set. This perspective naturally suggests a reduction from the unlearning problem to that of data attribution, where the goal is to predict the effect of changing the training set on a model's outputs. Thus motivated, we propose the following meta-algorithm, which we call Datamodel Matching (DMM): given a trained model, we (a) use data attribution to predict the output of the model if it were re-trained on all but the forget set points; then (b) fine-tune the pre-trained model to match these predicted outputs. In a simple convex setting, we show how this approach provably outperforms a variety of iterative unlearning algorithms. Empirically, we use a combination of existing evaluations and a new metric based on the KL-divergence to show that even in non-convex settings, DMM achieves strong unlearning performance relative to existing algorithms. An added benefit of DMM is that it is a meta-algorithm, in the sense that future advances in data attribution translate directly into better unlearning algorithms, pointing to a clear direction for future progress in unlearning.


ContextCite: Attributing Model Generation to Context

September 2024

·

3 Reads

·

1 Citation

How do language models use information provided as context when generating a response? Can we infer whether a particular generated statement is actually grounded in the context, a misinterpretation, or fabricated? To help answer these questions, we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement. We then present ContextCite, a simple and scalable method for context attribution that can be applied on top of any existing language model. Finally, we showcase the utility of ContextCite through three applications: (1) helping verify generated statements (2) improving response quality by pruning the context and (3) detecting poisoning attacks. We provide code for ContextCite at https://github.com/MadryLab/context-cite.


Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection

June 2024

·

21 Reads

Machine learning models can fail on subgroups that are underrepresented during training. While techniques such as dataset balancing can improve performance on underperforming groups, they require access to training group annotations and can end up removing large portions of the dataset. In this paper, we introduce Data Debiasing with Datamodels (D3M), a debiasing approach which isolates and removes specific training examples that drive the model's failures on minority groups. Our approach enables us to efficiently train debiased classifiers while removing only a small number of examples, and does not require training group annotations or additional hyperparameter tuning.


Citations (60)


... It is made In S-PLM2, contrastive learning is applied at the residue level to enable more detailed alignment between sequence and structure. As highlighted in a prior study [33], aligning sequence and structure embeddings at the residue level allows for the integration of spatial information from the structure, enhancing the richness of sequence-based representations. This residue-level alignment provides the model with the ability to capture fine-grained structural details, improving its effectiveness in tasks such as secondary structure prediction. ...

Reference:

Enhancing Structure-aware Protein Language Models with Efficient Fine-tuning for Various Protein Prediction Tasks
Jointly Embedding Protein Structures and Sequences through Residue Level Alignment

PRX Life

... By enforcing alignment with monkey IT representations, models exhibited both enhanced adversarial robustness and increased behavioral alignment with human subjects. Another study found that model metamers -artificial stimuli that elicit the same response as natural stimuli, generated by robust models-are more recognizable to humans, but is not itself predictive of recognizability [22]. However, studies using Cohen's Kappa report that robust models still diverge from humans on their error patterns [14]. ...

Model metamers reveal divergent invariances between biological and artificial neural networks

Nature Neuroscience

... Estimating influence at the individual training example level is often noisy: because each training example is seen only a few times during training, its individual effect on policy performance is often marginal and difficult to measure precisely. To reduce this variance, prior works clusters samples by class or task and then evaluates cluster-level influence instead of individual examples [53,54]. ...

A Data-Based Perspective on Transfer Learning
  • Citing Conference Paper
  • June 2023

... These solutions generally follow three patterns: custom file formats for efficient streaming, asynchronous loading mechanisms, and shared caching systems. FFCV (Leclerc et al., 2023) combines efficient file formats with asynchronous transfers to maximize GPU utilization. WebDataset (Aizman et al., 2020) uses tar archives to enable sequential reads from cloud storage, achieving ten-fold improvements over traditional approaches. ...

FFCV: Accelerating Training by Removing Data Bottlenecks
  • Citing Conference Paper
  • June 2023

... The first thing that would help you understand how these recommendation settings function and make better content decisions is if they were more obvious. Additionally, by promoting positive discourse in online forums, you can gain exposure to different points of view (Cen, Madry, & Shah, 2023). To guarantee a good experience for all users, Fazelpour and Danks (2021) stress the significance of neutral and fair environments. ...

A User-Driven Framework for Regulating and Auditing Social Media

... doi.org/10.32388/BOJDXM outperforms the existing data attribution methods for di usion models as measured by common data attribution metrics like the Linear Data-modelling Score [16] or retraining without top in uences. Finally, we also discuss interesting empirical observations that challenge our current understanding of in uence functions in the context of di usion models. ...

TRAK: Attributing Model Behavior at Scale
  • Citing Preprint
  • March 2023

... [4] developed an almost linear (i.e, O(m 1+ǫ )) time algorithm for the more general mininum cost flow problem. This follows a interior point methodology initially developed by Madry [14] in the context of the maximum flow problem which in turn follows the developement of near linear time algorithms for solving Laplacian linear systems of equations [15]. There is a great deal of work developing the latter ideas involving a number of contributors for which we refer to the interested reader to the survey of Cruz-Mejía and Letchford [5]. ...

Computing Maximum Flow with Augmenting Electrical Flows
  • Citing Conference Paper
  • October 2016

... In all cases, we implement the inference layer as a linear layer feeding on concept logits. Following the original implementations, LF-CBM and VLG-CBM incentivize sparsity by using GLM-SAGA [79], while CBM and Naïve minimize the unregularized cross entropy loss and LaBo regularizes the linear layer by applying a softmax operator to the weight rows [80]. All details regarding architectures and hyperparameter tuning are reported in Sections A.1 and A.4. ...

Leveraging Sparse Linear Layers for Debuggable Deep Networks
  • Citing Article
  • January 2021

... For experiments, we use a Resnet-50 [25], which was adversarially trained on ImageNet [26,27], as our CNN backbone. This architecture has shown good fits to neural data in several studies [28,29,30]. To predict the response to an image, we feed it through the Resnet up to the last ReLU of layer 4.1. ...

Model metamers illuminate divergences between biological and artificial neural networks