Michael Maire’s research while affiliated with University of Chicago and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (64)


Nested Diffusion Models Using Hierarchical Latent Priors
  • Preprint

December 2024

·

2 Reads

Xiao Zhang

·

Ruoxi Jiang

·

Rebecca Willett

·

Michael Maire

We introduce nested diffusion models, an efficient and powerful hierarchical generative framework that substantially enhances the generation quality of diffusion models, particularly for images of complex scenes. Our approach employs a series of diffusion models to progressively generate latent variables at different semantic levels. Each model in this series is conditioned on the output of the preceding higher-level models, culminating in image generation. Hierarchical latent variables guide the generation process along predefined semantic pathways, allowing our approach to capture intricate structural details while significantly improving image quality. To construct these latent variables, we leverage a pre-trained visual encoder, which learns strong semantic visual representations, and modulate its capacity via dimensionality reduction and noise injection. Across multiple datasets, our system demonstrates significant enhancements in image quality for both unconditional and class/text conditional generation. Moreover, our unconditional generation system substantially outperforms the baseline conditional system. These advancements incur minimal computational overhead as the more abstract levels of our hierarchy work with lower-dimensional representations.


PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement

November 2024

Tewodros Ayalew

·

Xiao Zhang

·

Kevin Yuanbo Wu

·

[...]

·

Matthew R. Walter

We present PROGRESSOR, a novel framework that learns a task-agnostic reward function from videos, enabling policy training through goal-conditioned reinforcement learning (RL) without manual supervision. Underlying this reward is an estimate of the distribution over task progress as a function of the current, initial, and goal observations that is learned in a self-supervised fashion. Crucially, PROGRESSOR refines rewards adversarially during online RL training by pushing back predictions for out-of-distribution observations, to mitigate distribution shift inherent in non-expert observations. Utilizing this progress prediction as a dense reward together with an adversarial push-back, we show that PROGRESSOR enables robots to learn complex behaviors without any external supervision. Pretrained on large-scale egocentric human video from EPIC-KITCHENS, PROGRESSOR requires no fine-tuning on in-domain task-specific data for generalization to real-robot offline RL under noisy demonstrations, outperforming contemporary methods that provide dense visual reward for robotic learning. Our findings highlight the potential of PROGRESSOR for scalable robotic applications where direct action labels and task-specific rewards are not readily available.


Figure 4: Top row: Training losses for all tasks. Bottom row: Validation losses for all tasks. Red is the full model. Blue is post-training pruning the bottom half of the SVD for every matrix in the model that is not the final layer. Green is post-training pruning the top half of the SVD. Notice that for all models, keeping the top half of the SVD is close to the full model performance, supporting the idea that the top directions provide a better approximation to the function.
Figure 7: Diagonal of alignment for a single pair over time (Eqn. 3) and alignment metric across pairs of matrices over time (Eqn. 4) where the y-axis represents depth. From top to bottom, for VGG we use coefficients {0, 0.001, 0.01, 0.1}, while for other networks we use coefficients {0, 0.1, 1, 10}. We see that the maximum alignment magnitude is higher with large weight decay, and in particular, the Transformer has the strongest alignment even when nonlinearities separate the MLP layers.
Figure 9: Top row: Singular vector agreement for a single matrix in the middle of each model (diagonal of Eqn. 5). Notice top singular vectors become stable in direction earlier. Bottom row: Summary score for each matrix across architectures. As we move down the y-axis, the depth of the parameters in the model increases, while the x-axis tracks training time. The sharp transition midway through training in the VGG case is likely due to a 10x learning rate decay.
Figure 10: Top row: Magnitude pruning. Bottom row: random pruning. First column: Training loss. We see that at 5% sparsity magnitude pruning is significantly better than random pruning of the same layerwise sparsity. 2nd column: Singular vector alignment pre-and post-pruning at the end of training for a single layer (the 3rd convolution). We see that magnitude pruning approximates the top singular vectors, while random pruning at the same level does not. 3rd column: Singular vector alignment score pre-and post-pruning across all layers. Agreement is higher across all layers for magnitude pruning, though later layers do not agree, likely as later layers are wider so weights are lower magnitude. 4th column: Singular vector alignment between the pruned and unpruned models along the training trajectory. We see that the magnitude pruning still has similar dynamics in its top singular vectors, while random pruning does not. Last column: Singular vector alignment score between pruned and unpruned models across layers and time. Again evolution is similar for early layers with magnitude pruning, and completely different for random pruning.
Figure 11: Top row: Barrier size vs. split step. Middle row: singular vector agreement for a single matrix parameter between branch endpoints that share a common trunk. Bottom row: summary statistic for singular vector agreement across layers vs. split step. We see that as models exhibit LMC, they also share top singular vectors.

+2

Approaching Deep Learning through the Spectral Dynamics of Weights
  • Preprint
  • File available

August 2024

·

222 Reads

We propose an empirical approach centered on the spectral dynamics of weights -- the behavior of singular values and vectors during optimization -- to unify and clarify several phenomena in deep learning. We identify a consistent bias in optimization across various experiments, from small-scale ``grokking'' to large-scale tasks like image classification with ConvNets, image generation with UNets, speech recognition with LSTMs, and language modeling with Transformers. We also demonstrate that weight decay enhances this bias beyond its role as a norm regularizer, even in practical systems. Moreover, we show that these spectral dynamics distinguish memorizing networks from generalizing ones, offering a novel perspective on this longstanding conundrum. Additionally, we leverage spectral dynamics to explore the emergence of well-performing sparse subnetworks (lottery tickets) and the structure of the loss surface through linear mode connectivity. Our findings suggest that spectral dynamics provide a coherent framework to better understand the behavior of neural networks across diverse settings.

Download



Keeper: Automated Testing and Fixing of Machine Learning Software

June 2024

·

25 Reads

·

1 Citation

ACM Transactions on Software Engineering and Methodology

The increasing number of software applications incorporating machine learning (ML) solutions has led to the need for testing techniques. However, testing ML software requires tremendous human effort to design realistic and relevant test inputs, and to judge software output correctness according to human common sense. Even when misbehavior is exposed, it is often unclear whether the defect is inside ML API or the surrounding code, and how to fix the implementation. This article tackles these challenges by proposing Keeper, an automated testing and fixing tool for ML software. The core idea of Keeper is designing pseudo-inverse functions that semantically reverse the corresponding ML task in an empirical way and proxy common human judgment of real-world data. It incorporates these functions into a symbolic execution engine to generate tests. Keeper also detects code smells that degrade software performance. Once misbehavior is exposed, Keeper attempts to change how ML APIs are used to alleviate the misbehavior. Our evaluation on a variety of applications shows that Keeper greatly improves branch coverage, while identifying 74 previously unknown failures and 19 code smells from 56 out of 104 applications. Our user studies show that 78% of end-users and 95% of developers agree with Keeper's detection and fixing results.


Generative Lifting of Multiview to 3D from Unknown Pose: Wrapping NeRF inside Diffusion

June 2024

·

5 Reads

We cast multiview reconstruction from unknown pose as a generative modeling problem. From a collection of unannotated 2D images of a scene, our approach simultaneously learns both a network to predict camera pose from 2D image input, as well as the parameters of a Neural Radiance Field (NeRF) for the 3D scene. To drive learning, we wrap both the pose prediction network and NeRF inside a Denoising Diffusion Probabilistic Model (DDPM) and train the system via the standard denoising objective. Our framework requires the system accomplish the task of denoising an input 2D image by predicting its pose and rendering the NeRF from that pose. Learning to denoise thus forces the system to concurrently learn the underlying 3D NeRF representation and a mapping from images to camera extrinsic parameters. To facilitate the latter, we design a custom network architecture to represent pose as a distribution, granting implicit capacity for discovering view correspondences when trained end-to-end for denoising alone. This technique allows our system to successfully build NeRFs, without pose knowledge, for challenging scenes where competing methods fail. At the conclusion of training, our learned NeRF can be extracted and used as a 3D scene model; our full system can be used to sample novel camera poses and generate novel-view images.



Run-Time Prevention of Software Integration Failures of Machine Learning APIs

October 2023

·

28 Reads

·

2 Citations

Proceedings of the ACM on Programming Languages

Due to the under-specified interfaces, developers face challenges in correctly integrating machine learning (ML) APIs in software. Even when the ML API and the software are well designed on their own, the resulting application misbehaves when the API output is incompatible with the software. It is desirable to have an adapter that converts ML API output at runtime to better fit the software need and prevent integration failures. In this paper, we conduct an empirical study to understand ML API integration problems in real-world applications. Guided by this study, we present SmartGear, a tool that automatically detects and converts mismatching or incorrect ML API output at run time, serving as a middle layer between ML API and software. Our evaluation on a variety of open-source applications shows that SmartGear detects 70% incompatible API outputs and prevents 67% potential integration failures, outperforming alternative solutions.


Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation

June 2023

·

8 Reads

We develop an approach to efficiently grow neural networks, within which parameterization and optimization strategies are designed by considering their effects on the training dynamics. Unlike existing growing methods, which follow simple replication heuristics or utilize auxiliary gradient-based local optimization, we craft a parameterization scheme which dynamically stabilizes weight, activation, and gradient scaling as the architecture evolves, and maintains the inference functionality of the network. To address the optimization difficulty resulting from imbalanced training effort distributed to subnetworks fading in at different growth phases, we propose a learning rate adaption mechanism that rebalances the gradient contribution of these separate subcomponents. Experimental results show that our method achieves comparable or better accuracy than training large fixed-size models, while saving a substantial portion of the original computation budget for training. We demonstrate that these gains translate into real wall-clock training speedups.


Citations (46)


... Furthermore, while these methods rely on DINO features, novel works such as DiffSeg [35] and EmerDiff [29] shown that Stable Diffusion features can be used in downstream tasks with comparable and sometimes superior results. A recent study of the latent features of Stable Diffusion which employs spectral clustering methods demonstrates that these features encode useful semantic and spatial information [43], and motivates further exploration. ...

Reference:

Unsupervised Segmentation by Diffusing, Walking and Cutting
Deciphering ‘What’ and ‘Where’ Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations
  • Citing Conference Paper
  • June 2024

... While offloading KVs and attention computation have been popular recently [127,149,213], we need to be careful as it is challenging to predict the output lengths of LLM requests and thus their utilization patterns, and a KV for a single token may consume a few MBs. KVs of long documents can be precomputed, compressed, and fetched for later retrievals [184]. To reduce memory latency, one can opt for KV sharing across different attention heads [7,26,50], KV compression [63,129,176,184,236], model quantization [94,97,106,147,164,200,251,295,306], or different model architectures than Transformer, such as State Space Models (SSMs) [61,100] that do not rely on attentions, thereby not generating KVs. ...

CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving
  • Citing Conference Paper
  • August 2024

... RELATED WORK The widespread adoption of intelligent software across various industries has underscored the critical need to test software robustness [68], [69]. In NLP, models' expanding context window length has significantly increased the search space dimensionality, making manual test case construction inefficient [70] and costly [71]. Therefore, this paper focuses on automated testing methods for NLP software. ...

Keeper: Automated Testing and Fixing of Machine Learning Software
  • Citing Article
  • June 2024

ACM Transactions on Software Engineering and Methodology

... Challenge-1: Lacking interface specification. Performing cognitive and generation tasks, LLM agents and general AI components typically lack a specification detailing their behavior [10]. Given a particular input, LLM agents cannot specify whether it is able to provide a correct answer in an expected format. ...

Run-Time Prevention of Software Integration Failures of Machine Learning APIs
  • Citing Article
  • October 2023

Proceedings of the ACM on Programming Languages

... Beyond model-specific testing, researchers have explored testing approaches for entire AI-enabled software software at the system level. This includes testing autonomous driving systems [82]- [96], machine translation systems [97]- [100] and applications leveraging cloud-based machine learning APIs [101], [102]. Additionally, empirical studies offer insights into the software engineering challenges faced by real-world AI applications. ...

Automated testing of software that uses machine learning APIs
  • Citing Conference Paper
  • July 2022

... In practice and to the best of our knowledge, there exists no unified understanding or theory of how the augmentation chain and the positive mining strategy influence downstream performance or each other, nor any one-size-fits-all recipe for contrastive learning approaches. Some works have attempted to alleviate these issues by devising better positive mining strategies [11], reducing false negatives within the mining strategy [12], [13], or influencing positivity and negativity with semantic weighing [8], [11], [14]. From the standpoint of the augmentation chain, a body of work on understanding the effect of certain augmentations on downstream tasks exists [15], [16], and similar studies have started to appear in MIR [17]. ...

Boosting Contrastive Self-Supervised Learning with False Negative Cancellation
  • Citing Conference Paper
  • January 2022

... In recent years, generative adversarial network (GAN) based methods have come to dominate the filed of unsupervised foreground segmentation. Bielski et al. [39] encourage GAN to disentangle background and foreground by introducing randomly generated and translated foregrounds and masks; Chen et al. [40] employ GAN to segment foreground masks by redrawing objects without changing the distribution of the dataset; Savarese et al. [41] cluster foreground and background pixels from an information-theoretic perspective and generate pseudo-labels to train a segmentation model; Voynov et al. [42] demonstrate that pretraining GAN on large-scale datasets can improve the object segmentation performance. Although GAN based methods significantly boost performance in unsupervised foreground segmentation, they still have drawbacks, such as falling into trivial solutions easily and not being trainable end-to-end. ...

Information-Theoretic Segmentation by Inpainting Error Maximization
  • Citing Conference Paper
  • June 2021

... Beyond model-specific testing, researchers have explored testing approaches for entire AI-enabled software software at the system level. This includes testing autonomous driving systems [82]- [96], machine translation systems [97]- [100] and applications leveraging cloud-based machine learning APIs [101], [102]. Additionally, empirical studies offer insights into the software engineering challenges faced by real-world AI applications. ...

Are Machine Learning Cloud APIs Used Correctly?
  • Citing Conference Paper
  • May 2021