Jonathan LorraineNVIDIA | Nvidia · Department of Research
Jonathan Lorraine
Doctor of Philosophy
About
19
Publications
1,029
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
113
Citations
Introduction
I'm a research scientist at NVIDIA and a recent machine learning Ph.D. from the University of Toronto. My research focuses on hyperparameter optimization, learning in games, and - more generally - nested optimization. Previously, I was at Google and Facebook (now Meta) AI.
Skills and Expertise
Additional affiliations
June 2021 - October 2021
Position
- Researcher
Description
- I worked with Professor Jakob Foerster and a team of five to improve machine learning in multi-agent systems. - Authored spotlight paper on research findings at The International Conference on Autonomous and Multiagent Systems (AAMAS2022). - Advised product teams on state-of-the-art hyperparameter optimization techniques, leveraging personal research to enhance model performance and efficiency across various projects.
November 2021 - February 2022
Position
- Researcher
Description
- I blended research and applied engineering to improve an end-to-end AutoML platform adopted company-wide at Google to facilitate the development of production-ready models. - Designed method used by the team to select design choices (hyperparameters), significantly increasing production performance while using ~10x less computational resources. - Led publication of paper detailing method: "Task Selection for AutoML System Evaluation."
May 2018 - December 2024
Vector Institute
Position
- Researcher
Description
- As a partnership between the Vector Institute and the University of Toronto, I researched hyperparameter optimization, learning in games, nested optimization, and more. I’ve authored 10+ papers accepted for presentation to top ML conferences, including NeurIPS, AISTATS and ICML.
Education
September 2018 - July 2024
Publications
Publications (19)
This work explores expanding the capabilities of large language models (LLMs) pretrained on text to generate 3D meshes within a unified model. This offers key advantages of (1) leveraging spatial knowledge already embedded in LLMs, derived from textual sources like 3D tutorials, and (2) enabling conversational 3D generation and mesh understanding....
Diffusion models achieve high-quality sample generation at the cost of a lengthy multistep inference procedure. To overcome this, diffusion distillation techniques produce student generators capable of matching or surpassing the teacher in a single step. However, the student model's inference speed is limited by the size of the teacher architecture...
Neural networks are trained to learn an approximate mapping from an input domain to a target domain. Incorporating prior knowledge about true mappings is critical to learning a useful approximation. With current architectures, it is challenging to enforce structure on the derivatives of the input-output mapping. We propose to use a neural network t...
Gradient-based optimization has been critical to the success of machine learning, updating a single set of parameters to minimize a single loss. A growing number of applications rely on a generalization of this, where we have a bilevel or nested optimization of which subsets of parameters update on different objectives nested inside each other. We...
When training deep learning models, the performance depends largely on the selected hyperparameters. However, hyperparameter optimization (HPO) is often one of the most expensive parts of model design. Classical HPO methods treat this as a black-box optimization problem. However, gray-box HPO methods, which incorporate more information about the se...
Many training data attribution (TDA) methods aim to estimate how a model's behavior would change if one or more data points were removed from the training set. Methods based on implicit differentiation, such as influence functions, can be made computationally efficient, but fail to account for underspecification, the implicit bias of the optimizati...
Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently achieved high-quality results but requires a lengthy, per-prompt optimization to create 3D objects. To address this, we amortize optimization over text prompts by training on many pr...
Many problems in machine learning involve bilevel optimization (BLO), including hyperparameter optimization, meta-learning, and dataset distillation. Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively. In practice, often at least one of these sub-problems is overparameterized. In this case, there...
Our goal is to assess if AutoML system changes - i.e., to the search space or hyperparameter optimization - will improve the final model's performance on production tasks. However, we cannot test the changes on production tasks. Instead, we only have access to limited descriptors about tasks that our AutoML system previously executed, like the numb...
Ridge Rider (RR) is an algorithm for finding diverse solutions to optimization problems by following eigenvectors of the Hessian ("ridges"). RR is designed for conservative gradient systems (i.e., settings involving a single loss function), where it branches at saddles - easy-to-find bifurcation points. We generalize this idea to non-conservative,...
The gradients of convex functions are expressive models of non-trivial vector fields. For example, Brenier's theorem yields that the optimal transport map between any two measures on Euclidean space under the squared distance is realized as a convex gradient, which is a key insight used in recent generative flow models. In this paper, we study how...
Pre-training (PT) followed by fine-tuning (FT) is an effective method for training neural networks, and has led to significant performance improvements in many domains. PT can incorporate various design choices such as task and data reweighting strategies, augmentation policies, and noise models, all of which can significantly impact the quality of...
We generalize gradient descent with momentum for learning in differentiable games to have complex-valued momentum. We give theoretical motivation for our method by proving convergence on bilinear zero-sum games for simultaneous and alternating updates. Our method gives real-valued parameter updates, making it a drop-in replacement for standard opti...
We propose an algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations. We present results about the relationship between the IFT and differentiating through optimization, motivating our algorithm. We use the proposed approach to train modern...
Automatic methods for generating state-of-the-art neural network architectures without human experts have generated significant attention recently. This is because of the potential to remove human experts from the design loop which can reduce costs and decrease time to model deployment. Neural architecture search (NAS) techniques have improved sign...
Hyperparameter optimization can be formulated as a bilevel optimization problem, where the optimal parameters on the training set depend on the hyperparameters. We aim to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which maps hyperparameters to optimal weights and biases....
Machine learning models are often tuned by nesting optimization of model weights inside the optimization of hyperparameters. We give a method to collapse this nested optimization into joint stochastic optimization of weights and hyperparameters. Our process trains a neural network to output approximately optimal weights as a function of hyperparame...