Tommi Jaakkola’s research while affiliated with Massachusetts Institute of Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (394)


Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
  • Preprint
  • File available

January 2025

·

12 Reads

Nanye Ma

·

Shangyuan Tong

·

Haolin Jia

·

[...]

·

Saining Xie

Generative models have made significant impacts across various domains, largely due to their ability to scale during training by increasing data, computational resources, and model size, a phenomenon characterized by the scaling laws. Recent research has begun to explore inference-time scaling behavior in Large Language Models (LLMs), revealing how performance can further improve with additional computation during inference. Unlike LLMs, diffusion models inherently possess the flexibility to adjust inference-time computation via the number of denoising steps, although the performance gains typically flatten after a few dozen. In this work, we explore the inference-time scaling behavior of diffusion models beyond increasing denoising steps and investigate how the generation performance can further improve with increased computation. Specifically, we consider a search problem aimed at identifying better noises for the diffusion sampling process. We structure the design space along two axes: the verifiers used to provide feedback, and the algorithms used to find better noise candidates. Through extensive experiments on class-conditioned and text-conditioned image generation benchmarks, our findings reveal that increasing inference-time compute leads to substantial improvements in the quality of samples generated by diffusion models, and with the complicated nature of images, combinations of the components in the framework can be specifically chosen to conform with different application scenario.

Download

Boltz-1: Democratizing Biomolecular Interaction Modeling

November 2024

·

24 Reads

·

6 Citations

Understanding biomolecular interactions is fundamental to advancing fields like drug discovery and protein design. In this paper, we introduce Boltz-1, an open-source deep learning model incorporating innovations in model architecture, speed optimization, and data processing achieving AlphaFold3-level accuracy in predicting the 3D structures of biomolecular complexes. Boltz-1 demonstrates a performance on-par with state-of-the-art commercial models on a range of diverse benchmarks, setting a new benchmark for commercially accessible tools in structural biology. By releasing the training and inference code, model weights, datasets, and benchmarks under the MIT open license, we aim to foster global collaboration, accelerate discoveries, and provide a robust platform for advancing biomolecular modeling.


Figure 3: IB Curve
Figure 4a illustrates the performance of shared representationˆZrepresentationˆ representationˆZ c learned by DISENTANGLEDSSL across different values of β. For lower values of β, ˆ Z c captures both shared and specific features, as indicated by linear probing accuracy on Y 1 s , Y 2 s , and Y c exceeding 0.5. As β increases, all accuracies decrease, reflecting the trade-off between expressivity and redundancy controlled by δ c when MNI is unattainable. This trend aligns well with Figure 3 and Proposition 2. A comparison of the performance of the shared representation with baseline methods is provided in Appendix H.1. We further examine such trade-off on synthetic data with varying levels of multimodal entanglement, where some dimensions are aligned across modalities with MNI attainable and others remain entangled with MNI unattainable. Detailed results are provided in Appendix H.1. Given the shared representationsˆZrepresentationsˆ representationsˆZ 1 c andˆZandˆ andˆZ 2 c learned in step 1 for a fixed β, we then learn the corresponding modality-specific representationsˆZrepresentationsˆ representationsˆZ 1 s andˆZandˆ andˆZ 2 s with varying λ. Figure 4b shows the performance of DISENTANGLEDSSL in contrast to other baseline methods, where dots are connected according to a descending order of their corresponding λ values 5 . The ideal modality-specific representationˆZrepresentationˆ representationˆZ 1 s should maximize unique information from X 1 , shown by high accuracy on Y 1 s , while minimizing shared information with X 2 , indicated by low accuracy on Y c . Therefore, a bottomright point is preferred in Figure 4b. As illustrated in Figure 4b, DISENTANGLEDSSL outperforms all other methods across various hyperparameter settings, especially JointOpt, demonstrating the effectiveness of the stepwise optimization procedure. Results onˆZonˆ onˆZ 2 s are provided in Appendix H.1.
Figure 4: Simulation study results.
Retrieving acccuracy and mean reciprocal rank (MRR) of molecule-phenotype retrieval.
Prediction accuracy (%) of the representations learned by different methods on MultiBench datasets and standard deviations over 3 random seeds.
An Information Criterion for Controlled Disentanglement of Multimodal Data

October 2024

·

15 Reads

Multimodal representation learning seeks to relate and decompose information inherent in multiple modalities. By disentangling modality-specific information from information that is shared across modalities, we can improve interpretability and robustness and enable downstream tasks such as the generation of counterfactual outcomes. Separating the two types of information is challenging since they are often deeply entangled in many real-world applications. We propose Disentangled Self-Supervised Learning (DisentangledSSL), a novel self-supervised approach for learning disentangled representations. We present a comprehensive analysis of the optimality of each disentangled representation, particularly focusing on the scenario not covered in prior work where the so-called Minimum Necessary Information (MNI) point is not attainable. We demonstrate that DisentangledSSL successfully learns shared and modality-specific features on multiple synthetic and real-world datasets and consistently outperforms baselines on various downstream tasks, including prediction tasks for vision-language data, as well as molecule-phenotype retrieval tasks for biological data.


Figure 2: Evolution of various HGFs in joint coordinate-velocity space from t = 0 (blue) to t = T (red) with trajectories (black). Data distribution π(x) = 0.4 * N (−2, 1) + 0.6 * N (2, 1). Diffusion models and flow matching have zero force fields, i.e. the velocity does not change. Diffusion models do not converge in finite time (here, T = 3). The coupled distribution in FM allow for a convergence for T = 1. Both distort the joint distribution. Oscillation HGFs only rotate the distribution.
Figure 3: Empirical investigation of Hamiltonian score discrepancy (HSD). (a) The Taylor approximation is a good approximation. (b) Hamiltonian score discrepancy is strongly correlated with explicit score matching loss. (c) Signal-to-noise ratio is significantly better for HSM vs DSM for low σ.
Figure 5: Data distribution (left) and velocity distribution (right) used for Reflection HGFs as initial distribution. With the above starting conditions, a reflection (="infinite force") at the boundaries of the domain is used to simulate trajectories forward (this can be computed in closed form in a simulation-free manner).
Hamiltonian Score Matching and Generative Flows

October 2024

·

12 Reads

Classical Hamiltonian mechanics has been widely used in machine learning in the form of Hamiltonian Monte Carlo for applications with predetermined force fields. In this work, we explore the potential of deliberately designing force fields for Hamiltonian ODEs, introducing Hamiltonian velocity predictors (HVPs) as a tool for score matching and generative models. We present two innovations constructed with HVPs: Hamiltonian Score Matching (HSM), which estimates score functions by augmenting data via Hamiltonian trajectories, and Hamiltonian Generative Flows (HGFs), a novel generative model that encompasses diffusion models and flow matching as HGFs with zero force fields. We showcase the extended design space of force fields by introducing Oscillation HGFs, a generative model inspired by harmonic oscillators. Our experiments validate our theoretical insights about HSM as a novel score matching metric and demonstrate that HGFs rival leading generative modeling techniques.


Figure 4: . Examples of generated images on CIFAR10 (top) and ImageNet32 (bottom).
Generator Matching: Generative modeling with arbitrary Markov processes

October 2024

·

10 Reads

We introduce generator matching, a modality-agnostic framework for generative modeling using arbitrary Markov processes. Generators characterize the infinitesimal evolution of a Markov process, which we leverage for generative modeling in a similar vein to flow matching: we construct conditional generators which generate single data points, then learn to approximate the marginal generator which generates the full data distribution. We show that generator matching unifies various generative modeling methods, including diffusion models, flow matching and discrete diffusion models. Furthermore, it provides the foundation to expand the design space to new and unexplored Markov processes such as jump processes. Finally, generator matching enables the construction of superpositions of Markov generative processes and enables the construction of multimodal models in a rigorous manner. We empirically validate our method on protein and image structure generation, showing that superposition with a jump process improves image generation.


A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing

October 2024

·

3 Reads

Efficiently processing structured point cloud data while preserving multiscale information is a key challenge across domains, from graphics to atomistic modeling. Using a curated dataset of simulated galaxy positions and properties, represented as point clouds, we benchmark the ability of graph neural networks to simultaneously capture local clustering environments and long-range correlations. Given the homogeneous and isotropic nature of the Universe, the data exhibits a high degree of symmetry. We therefore focus on evaluating the performance of Euclidean symmetry-preserving (E(3)-equivariant) graph neural networks, showing that they can outperform non-equivariant counterparts and domain-specific information extraction techniques in downstream performance as well as simulation-efficiency. However, we find that current architectures fail to capture information from long-range correlations as effectively as domain-specific baselines, motivating future work on architectures better suited for extracting long-range information.


Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning

October 2024

·

24 Reads

Recent studies have identified one aggravating factor of LLM hallucinations as the knowledge inconsistency between pre-training and fine-tuning, where unfamiliar fine-tuning data mislead the LLM to fabricate plausible but wrong outputs. In this paper, we propose a novel fine-tuning strategy called Prereq-Tune to address this knowledge inconsistency and reduce hallucinations. Fundamentally, Prereq-Tune disentangles the learning of skills and knowledge, so the model learns only the task skills without being impacted by the knowledge inconsistency. To achieve this, Prereq-Tune introduces an additional prerequisite learning stage to learn the necessary knowledge for SFT, allowing subsequent SFT to focus only on task skills. Prereq-Tune can also be combined with fictitious synthetic data to enhance the grounding of LLM outputs to their internal knowledge. Experiments show that Prereq-Tune outperforms existing baselines in improving LLM's factuality across short QA and long-form generation tasks. It also opens new possibilities for knowledge-controlled generation in LLMs. Our code is available at https://github.com/UCSB-NLP-Chang/Prereq_tune.git.


Think While You Generate: Discrete Diffusion with Planned Denoising

October 2024

·

67 Reads

Discrete diffusion has achieved state-of-the-art performance, outperforming or approaching autoregressive models on standard benchmarks. In this work, we introduce Discrete Diffusion with Planned Denoising (DDPD), a novel framework that separates the generation process into two models: a planner and a denoiser. At inference time, the planner selects which positions to denoise next by identifying the most corrupted positions in need of denoising, including both initially corrupted and those requiring additional refinement. This plan-and-denoise approach enables more efficient reconstruction during generation by iteratively identifying and denoising corruptions in the optimal order. DDPD outperforms traditional denoiser-only mask diffusion methods, achieving superior results on language modeling benchmarks such as text8, OpenWebText, and token-based generation on ImageNet 256×256256 \times 256. Notably, in language modeling, DDPD significantly reduces the performance gap between diffusion-based and autoregressive methods in terms of generative perplexity. Code is available at https://github.com/liusulin/DDPD.


Predicting perturbation targets with causal differential networks

October 2024

·

38 Reads

Rationally identifying variables responsible for changes to a biological system can enable myriad applications in disease understanding and cell engineering. From a causality perspective, we are given two datasets generated by the same causal model, one observational (control) and one interventional (perturbed). The goal is to isolate the subset of measured variables (e.g. genes) that were the targets of the intervention, i.e. those whose conditional independencies have changed. Knowing the causal graph would limit the search space, allowing us to efficiently pinpoint these variables. However, current algorithms that infer causal graphs in the presence of unknown intervention targets scale poorly to the hundreds or thousands of variables in biological data, as they must jointly search the combinatorial spaces of graphs and consistent intervention targets. In this work, we propose a causality-inspired approach for predicting perturbation targets that decouples the two search steps. First, we use an amortized causal discovery model to separately infer causal graphs from the observational and interventional datasets. Then, we learn to map these paired graphs to the sets of variables that were intervened upon, in a supervised learning framework. This approach consistently outperforms baselines for perturbation modeling on seven single-cell transcriptomics datasets, each with thousands of measured variables. We also demonstrate significant improvements over six causal discovery algorithms in predicting intervention targets across a variety of tractable, synthetic datasets.



Citations (52)


... In the aforementioned example, other related facts-such as that Harry Potter is a wizard and Hogwarts is a boarding school of magic for young wizards-should not be forgotten. This capability sets our approach apart from previous methods (Eldan & Russinovich (2023); Liu et al. (2024a)). ...

Reference:

UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs
Revisiting Who’s Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective
  • Citing Conference Paper
  • January 2024

... In addition to PEP-FOLD3, alternative AI-based approaches can also be used to generate initial peptide conformations. While PEP-FOLD3 was used for structure generation in this study, the use of alternative AIbased tools such as Boltz-1 [61] could provide additional structural diversity and improve docking predictions in future work. ...

Boltz-1: Democratizing Biomolecular Interaction Modeling
  • Citing Preprint
  • November 2024

... Generative models have also shown great potential as effi-cient surrogates for MD simulations, which are often computationally intensive for complex systems or long-term behaviors. Jing et al. (2024) developed a framework to simulate molecular trajectories using generative models. Similarly, Viguera Diez et al. (2023) used generative models to enhance sampling of slow degrees of freedom, covering the sample space more effectively than traditional MD. ...

Generative Modeling of Molecular Dynamics Trajectories
  • Citing Article
  • September 2024

... An exciting possibility is to incorporate additional thermodynamics into the free energy function, essentially applying the concept of thermodynamically consistent learning 43 to coarse-graining; such a concept has recently be introduced and explored for the coarse-graining of hexane. 106 With a wide range of recent improvements and ideas, machine learning-based coarse-graining is poised to enable accurate simulations on large length and time scales across a wider range of thermodynamic conditions. ■ ASSOCIATED CONTENT ...

Thermodynamically Informed Multimodal Learning of High-Dimensional Free Energy Models in Molecular Coarse Graining

... To address this, several works focus on sample selection guided by some pre-defined metrics using the Random Search algorithm. Karthik et al.[34] and Liu et al.[48] use pre-trained VQA and human preference models to guide the selection, and Liu et al.[48] further update the proposal distribution during selection to better align with the ground truth distribution. Similarly, Na et al.[51] performs rejection sampling on the updated proposal distribution during intermediate diffusion denoising step. ...

Correcting Diffusion Generation Through Resampling
  • Citing Conference Paper
  • June 2024

... Flow matching models have seen wide adoption in speech [23], image generation [7,17,18,22], super-resolution [30], depth estimation [12] and video generation [19], but their application in high-dimensional discrete domains is still limited. Discrete flow matching [3,10,29,31] addresses this limitation, introducing a novel discrete flow paradigm designed for discrete data generation. Building on this, Hu and Ommer [16] validate the efficacy of discrete flow matching in the image domain and bridge the connection between Discrete Diffusion and Masked Generative Models [4]. ...

Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design
  • Citing Article
  • June 2024

... Flow matching (Lipman et al., 2023;Tong et al., 2024) learns a continuous transformation from noise to data via an ODE governed by a vector field. Recent extensions handle discrete data (Gat et al., 2024;Davis et al., 2024;Stark et al., 2024). While these methods circumvent explicit Markovian noising, they often require continuous flow formulations and specialized training objectives. ...

Dirichlet Flow Matching with Applications to DNA Sequence Design
  • Citing Article
  • May 2024

... 150,151 Lastly, work has been carried out towards extending models to multiple ranges of thermodynamic properties like temperature and pressure. 152 This allows simulation of different environments as well as training on previously unsuitable data. Adding extra parameters like temperature to the model input, one can add the corresponding derivatives of the coarse-grained free energy function to the loss. ...

Thermodynamically Informed Multimodal Learning of High-Dimensional Free Energy Models in Molecular Coarse Graining

... To remedy this issue, diffusion models offer an innovative approach, as they have gained numerous attention for their versatile conditioning capabilities [34][35][36][37][38] . These models have been actively applied in the field of materials generation with drug-like molecules 39,40 , proteins 41,42 and small crystals 43 being main targets. When it comes to generating MOFs using diffusion models, Park et al. 44 focused on the generation of MOF linkers rather than entire structures, while Fu et al. 45 reduced the structural complexity by applying coarse-grained representation. ...

Diffusion models in protein structure and docking
  • Citing Article
  • April 2024

Wiley interdisciplinary reviews: Computational Molecular Science.