Jonathan Lorraine

Jonathan Lorraine
NVIDIA | Nvidia · Department of Research

Doctor of Philosophy

About

19
Publications
1,029
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
113
Citations
Introduction
I'm a research scientist at NVIDIA and a recent machine learning Ph.D. from the University of Toronto. My research focuses on hyperparameter optimization, learning in games, and - more generally - nested optimization. Previously, I was at Google and Facebook (now Meta) AI.
Additional affiliations
June 2021 - October 2021
Meta
Position
  • Researcher
Description
  • I worked with Professor Jakob Foerster and a team of five to improve machine learning in multi-agent systems. - Authored spotlight paper on research findings at The International Conference on Autonomous and Multiagent Systems (AAMAS2022). - Advised product teams on state-of-the-art hyperparameter optimization techniques, leveraging personal research to enhance model performance and efficiency across various projects.
November 2021 - February 2022
Google Inc.
Position
  • Researcher
Description
  • I blended research and applied engineering to improve an end-to-end AutoML platform adopted company-wide at Google to facilitate the development of production-ready models. - Designed method used by the team to select design choices (hyperparameters), significantly increasing production performance while using ~10x less computational resources. - Led publication of paper detailing method: "Task Selection for AutoML System Evaluation."
May 2018 - December 2024
Vector Institute
Position
  • Researcher
Description
  • As a partnership between the Vector Institute and the University of Toronto, I researched hyperparameter optimization, learning in games, nested optimization, and more. I’ve authored 10+ papers accepted for presentation to top ML conferences, including NeurIPS, AISTATS and ICML.
Education
September 2018 - July 2024
University of Toronto
Field of study
  • Machine Learning

Publications

Publications (19)
Preprint
Full-text available
This work explores expanding the capabilities of large language models (LLMs) pretrained on text to generate 3D meshes within a unified model. This offers key advantages of (1) leveraging spatial knowledge already embedded in LLMs, derived from textual sources like 3D tutorials, and (2) enabling conversational 3D generation and mesh understanding....
Preprint
Full-text available
Diffusion models achieve high-quality sample generation at the cost of a lengthy multistep inference procedure. To overcome this, diffusion distillation techniques produce student generators capable of matching or surpassing the teacher in a single step. However, the student model's inference speed is limited by the size of the teacher architecture...
Preprint
Full-text available
Neural networks are trained to learn an approximate mapping from an input domain to a target domain. Incorporating prior knowledge about true mappings is critical to learning a useful approximation. With current architectures, it is challenging to enforce structure on the derivatives of the input-output mapping. We propose to use a neural network t...
Preprint
Full-text available
Gradient-based optimization has been critical to the success of machine learning, updating a single set of parameters to minimize a single loss. A growing number of applications rely on a generalization of this, where we have a bilevel or nested optimization of which subsets of parameters update on different objectives nested inside each other. We...
Preprint
Full-text available
When training deep learning models, the performance depends largely on the selected hyperparameters. However, hyperparameter optimization (HPO) is often one of the most expensive parts of model design. Classical HPO methods treat this as a black-box optimization problem. However, gray-box HPO methods, which incorporate more information about the se...
Preprint
Full-text available
Many training data attribution (TDA) methods aim to estimate how a model's behavior would change if one or more data points were removed from the training set. Methods based on implicit differentiation, such as influence functions, can be made computationally efficient, but fail to account for underspecification, the implicit bias of the optimizati...
Preprint
Full-text available
Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently achieved high-quality results but requires a lengthy, per-prompt optimization to create 3D objects. To address this, we amortize optimization over text prompts by training on many pr...
Preprint
Full-text available
Many problems in machine learning involve bilevel optimization (BLO), including hyperparameter optimization, meta-learning, and dataset distillation. Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively. In practice, often at least one of these sub-problems is overparameterized. In this case, there...
Preprint
Full-text available
Our goal is to assess if AutoML system changes - i.e., to the search space or hyperparameter optimization - will improve the final model's performance on production tasks. However, we cannot test the changes on production tasks. Instead, we only have access to limited descriptors about tasks that our AutoML system previously executed, like the numb...
Preprint
Full-text available
Ridge Rider (RR) is an algorithm for finding diverse solutions to optimization problems by following eigenvectors of the Hessian ("ridges"). RR is designed for conservative gradient systems (i.e., settings involving a single loss function), where it branches at saddles - easy-to-find bifurcation points. We generalize this idea to non-conservative,...
Preprint
Full-text available
The gradients of convex functions are expressive models of non-trivial vector fields. For example, Brenier's theorem yields that the optimal transport map between any two measures on Euclidean space under the squared distance is realized as a convex gradient, which is a key insight used in recent generative flow models. In this paper, we study how...
Preprint
Full-text available
Pre-training (PT) followed by fine-tuning (FT) is an effective method for training neural networks, and has led to significant performance improvements in many domains. PT can incorporate various design choices such as task and data reweighting strategies, augmentation policies, and noise models, all of which can significantly impact the quality of...
Preprint
Full-text available
We generalize gradient descent with momentum for learning in differentiable games to have complex-valued momentum. We give theoretical motivation for our method by proving convergence on bilinear zero-sum games for simultaneous and alternating updates. Our method gives real-valued parameter updates, making it a drop-in replacement for standard opti...
Preprint
Full-text available
We propose an algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations. We present results about the relationship between the IFT and differentiating through optimization, motivating our algorithm. We use the proposed approach to train modern...
Preprint
Full-text available
Automatic methods for generating state-of-the-art neural network architectures without human experts have generated significant attention recently. This is because of the potential to remove human experts from the design loop which can reduce costs and decrease time to model deployment. Neural architecture search (NAS) techniques have improved sign...
Preprint
Full-text available
Hyperparameter optimization can be formulated as a bilevel optimization problem, where the optimal parameters on the training set depend on the hyperparameters. We aim to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which maps hyperparameters to optimal weights and biases....
Article
Full-text available
Machine learning models are often tuned by nesting optimization of model weights inside the optimization of hyperparameters. We give a method to collapse this nested optimization into joint stochastic optimization of weights and hyperparameters. Our process trains a neural network to output approximately optimal weights as a function of hyperparame...

Network

Cited By