Jakub Mikolaj TomczakEindhoven University of Technology | TUE · Department of Mathematics and Computer Science
Jakub Mikolaj Tomczak
PhD
About
137
Publications
47,653
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,220
Citations
Introduction
I have broad interests in machine learning. My PhD research area focused on decision rules extraction from non-stationary datastreams. I am interested in deep learning and Bayesian modeling paradigm with special concern on latent variable models and amortized variational inference.
Publications
Publications (137)
Deep hierarchical variational autoencoders (VAEs) are powerful latent variable generative models. In this paper, we introduce Hierarchical VAE with Diffusion-based Variational Mixture of the Posterior Prior (VampPrior). We apply amortization to scale the VampPrior to models with many stochastic layers. The proposed approach allows us to achieve bet...
Large Language Models (LLMs) have revolutionized AI systems by enabling communication with machines using natural language. Recent developments in Generative AI (GenAI) like Vision-Language Models (GPT-4V) and Gemini have shown great promise in using LLMs as multimodal systems. This new research line results in building Generative AI systems, GenAI...
Let us imagine cats. Most people like cats, and some people are crazy in love with cats. There are ginger cats, black cats, big cats, small cats, puffy cats, and furless cats. In fact, there are many different kinds of cats. However, when I say this word: “a cat,” everyone has some kind of a cat in their mind. One can close eyes and generate a pict...
In the previous sections, we discussed two approaches to learning p(x): autoregressive models (ARMs) in Chap. 3 and flow-based models (or flows for short) in Chap. 4. Both ARMs and flows model the likelihood function directly, that is, either by factorizing the distribution and parameterizing conditional distributions p(xd|x<d) as in ARMs or by uti...
So far, we have discussed a class of deep generative models that model the distribution p(x) directly in an autoregressive manner. The main advantage of ARMs is that they can learn long-range statistics and, as a consequence, powerful density estimators. However, their drawback is that they are parameterized in an autoregressive manner; hence, samp...
I must say that it is hard to come up with a shorter definition of concurrent generative modeling. Once we look at various classes of models, we immediately notice that this is exactly what we try to do: generate data from noise! Don’t believe me? Ok, we should have a look at how various classes of generative models work.
Before we start thinking about (deep) generative modeling, let us consider a simple example. Imagine we have trained a deep neural network that classifies images (\(\mathbf {x} \in \mathbb {Z}^{D}\)) of animals (\(y \in \mathcal {Y}\), and \(\mathcal {Y} = \{cat, dog, horse\}\)). Further, let us assume that this neural network is trained really wel...
Once we discussed latent variable models, we claimed that they naturally define a generative process by first sampling latents z ∼ p(z) and then generating observables x ∼ pθ(x|z). That is nice! However, the problem appears when we start thinking about training. To be more precise, the training objective is an issue. Why? Well, the probability theo...
In Chap. 1, I tried to convince you that learning the conditional distribution p(y|x) is not enough and, instead, we should focus on the joint distribution p(x, y).
How is it possible, my curious reader, that we can share our thoughts? How can it be that we discuss generative modeling, probability theory, or other interesting concepts? How come? The answer is simple: language. We communicate because the human species developed a pretty distinctive trait that allows us to formulate sounds in a very complex mann...
Before we start discussing how we can model the distribution p(x), we refresh our memory about the core rules of probability theory, namely, the sum rule and the product rule. Let us introduce two random variables x and y.
So far, we have discussed various deep generative models for modeling the marginal distribution over observable variables (e.g., images), p(x), such as autoregressive models (ARMs), flow-based models (flows, for short), variational autoencoders (VAEs), and hierarchical models like hierarchical VAEs and diffusion-based deep generative models (DDGMs)...
In December 2020, Facebook reported having around 1.8 billion daily active users and around 2.8 billion monthly active users (Facebook reports fourth quarter and full year 2020 results, 2020.). Assuming that users uploaded, on average, a single photo each day, the resulting volume of data would give a very rough (let me stress it, a very rough) est...
Evolutionary robot systems offer two principal advantages: an advanced way of developing robots through evolutionary optimization and a special research platform to conduct what-if experiments regarding questions about evolution. Our study sits at the intersection of these. We investigate the question “What if the 18th-century biologist Lamarck was...
Joint machine learning models that allow synthesizing and classifying data often offer uneven performance between those tasks or are unstable to train. In this work, we depart from a set of empirical observations that indicate the usefulness of internal representations built by contemporary deep diffusion-based generative models not only for genera...
We present an approach for unsupervised learning of geometrically meaningful representations via equivariant variational autoencoders (VAEs) with hyperspherical latent representations. The equivariant encoder/decoder ensures that these latents are geometrically meaningful and grounded in the input space. Mapping these geometry-grounded latents to h...
Evolutionary robot systems offer two principal advantages: an advanced way of developing robots through evolutionary optimization and a special research platform to conduct what-if experiments regarding questions about evolution. Our study sits at the intersection of these. We investigate the question ''What if the 18th-century biologist Lamarck wa...
Diffusion models have achieved remarkable success in generating high-quality images thanks to their novel training procedures applied to unprecedented amounts of data. However, training a diffusion model from scratch is computationally expensive. This highlights the need to investigate the possibility of training these models iteratively, reusing c...
Hierarchical Variational Autoencoders (VAEs) are among the most popular likelihood-based generative models. There is rather a consensus that the top-down hierarchical VAEs allow to effectively learn deep latent structures and avoid problems like the posterior collapse. Here, we show that it is not necessarily the case and the problem of collapsing...
We introduce a joint diffusion model that simultaneously learns meaningful internal representations fit for both generative and predictive tasks. Joint machine learning models that allow synthesizing and classifying data often offer uneven performance between those tasks or are unstable to train. In this work, we depart from a set of empirical obse...
Performant Convolutional Neural Network (CNN) architectures must be tailored to specific tasks in order to consider the length, resolution, and dimensionality of the input data. In this work, we tackle the need for problem-specific CNN architectures. We present the Continuous Convolutional Neural Network (CCNN): a single CNN able to process data of...
We study the problem of combining neural networks with symbolic reasoning. Recently introduced frameworks for Probabilistic Neurosymbolic Learning (PNL), such as DeepProbLog, perform exponential-time exact inference, limiting the scalability of PNL solutions. We introduce Approximate Neurosymbolic Inference (A-NeSI): a new framework for PNL that us...
The use of Convolutional Neural Networks (CNNs) is widespread in Deep Learning due to a range of desirable model properties which result in an efficient and effective machine learning framework. However, performant CNN architectures must be tailored to specific tasks in order to incorporate considerations such as the input length, resolution, and d...
Diffusion-based Deep Generative Models (DDGMs) offer state-of-the-art performance in generative modeling. Their main strength comes from their unique setup in which a model (the backward diffusion process) is trained to reverse the forward diffusion process, which gradually adds noise to the input signal. Although DDGMs are well studied, it is stil...
Simultaneously evolving morphologies (bodies) and controllers (brains) of robots can cause a mismatch between the inherited body and brain in the offspring. To mitigate this problem, the addition of an infant learning period has been proposed relatively long ago by the so-called Triangle of Life approach. However, an empirical assessment is still l...
Variational autoencoders (VAEs) are deep generative models used in various domains. VAEs can generate complex objects and provide meaningful latent representations, which can be further used in downstream tasks such as classification. As previous work has shown, one can easily fool VAEs to produce unexpected latent representations and reconstructio...
Once we discussed latent variable models, we claimed that they naturally define a generative process by first sampling latents z ∼ p(z) and then generating observables x ∼ p
θ(x|z). That is nice! However, the problem appears when we start thinking about training. To be more precise, the training objective is an issue. Why? Well, the probability the...
So far, we have discussed a class of deep generative models that model the distribution p(x) directly in an autoregressive manner. The main advantage of ARMs is that they can learn long-range statistics and, in a consequence, powerful density estimators. However, their drawback is that they are parameterized in an autoregressive manner, hence, samp...
In the previous chapters, we discussed two approaches to learning p(x): autoregressive models (ARMs) in Chap. 2 and flow-based models (or flows for short) in Chap. 3. Both ARMs and flows model the likelihood function directly, that is, either by factorizing the distribution and parameterizing conditional distributions p(x
d|x
<d) as in ARMs or by u...
So far, we have discussed various deep generative models for modeling the marginal distribution over observable variables (e.g., images), p(x), such as, autoregressive models (ARMs), flow-based models (flows, for short), Variational Auto-Encoders (VAEs), and hierarchical models like hierarchical VAEs and diffusion-based deep generative models (DDGM...
Before we start thinking about (deep) generative modeling, let us consider a simple example. Imagine we have trained a deep neural network that classifies images (\(\mathbf {x} \in \mathbb {Z}^{D}\)) of animals (\(y \in \mathcal {Y}\), and \(\mathcal {Y} = \{cat, dog, horse\}\)). Further, let us assume that this neural network is trained really wel...
In Chap. 1, I tried to convince you that learning the conditional distribution p(y|x) is not enough and, instead, we should focus on the joint distribution p(x, y) factorized as follows:
$$\displaystyle p(\mathbf {x}, y) = p(y|\mathbf {x}) p(\mathbf {x}) .
In December 2020, Facebook reported having around 1.8 billion daily active users and around 2.8 billion monthly active users. Assuming that users uploaded, on average, a single photo each day, the resulting volume of data would give a very rough (let me stress it: a very rough) estimate of around 3000TB of new images per day. This single case of Fa...
Before we start discussing how we can model the distribution p(x), we refresh our memory about the core rules of probability theory, namely, the sum rule and the product rule. Let us introduce two random variables x and y. Their joint distribution is p(x, y). The product rule allows us to factorize the joint distribution in two manners, namely:
$$\...
Simultaneously evolving morphologies (bodies) and controllers (brains) of robots can cause a mismatch between the inherited body and brain in the offspring. To mitigate this problem, the addition of an infant learning period by the so-called Triangle of Life framework has been proposed relatively long ago. However, an empirical assessment is still...
When designing Convolutional Neural Networks (CNNs), one must select the size of the convolutional kernels before training. Recent works show CNNs benefit from different kernel sizes at different layers, but exploring all possible combinations is unfeasible in practice. A more efficient approach is to learn the kernel size during training. However,...
Spiking neural networks are a promising approach towards next-generation models of the brain in computational neuroscience. Moreover, compared to classic artificial neural networks, they could serve as an energy-efficient deployment of AI by enabling fast computation in specialized neuromorphic hardware. However, training deep spiking neural networ...
Not all generate-and-test search algorithms are created equal. Bayesian Optimization (BO) invests a lot of computation time to generate the candidate solution that best balances the predicted value and the uncertainty given all previous data, taking increasingly more time as the number of evaluations performed grows. Evolutionary Algorithms (EA) on...
When controllers (brains) and morphologies (bodies) of robots simultaneously evolve, this can lead to a problem, namely the brain & body mismatch problem. In this research, we propose a solution of lifetime learning. We set up a system where modular robots can create offspring that inherit the bodies of parents by recombination and mutation. With r...
The vision behind this paper looks ahead to evolutionary robot systems where morphologies and controllers are evolved together and ‘newborn’ robots undergo a learning process to optimize their inherited brain for the inherited body. The specific problem we address is learning controllers for the task of directed locomotion in evolvable modular robo...
Density estimation, compression, and data generation are crucial tasks in artificial intelligence. Variational Auto-Encoders (VAEs) constitute a single framework to achieve these goals. Here, we present a novel class of generative models, called self-supervised Variational Auto-Encoder (selfVAE), which utilizes deterministic and discrete transforma...
In biology and medicine, cell counting is one of the most important elements of cytometry, with applications to research and clinical practice. For instance, the complete cell count could help to determine conditions for which cancer cells could grow or not. However, cell counting is a laborious and time-consuming process, and its automatization is...
In this study, the classification of white cabbage seedling images is modeled with convolutional neural networks. We focus on a dataset that tracks the seedling growth over a period of 14 days, where photos were taken at four specific moments. The dataset contains 13,200 individual seedlings with corresponding labels and was retrieved from Bejo, a...
Modelers use automatic differentiation of computation graphs to implement complex Deep Learning models without defining gradient computations. However, modelers often use sampling methods to estimate intractable expectations such as in Reinforcement Learning and Variational Inference. Current methods for estimating gradients through these sampling...
In this work, we explore adversarial attacks on the Variational Autoencoders (VAE). We show how to modify data point to obtain a prescribed latent code (supervised attack) or just get a drastically different code (unsupervised attack). We examine the influence of model modifications ($\beta$-VAE, NVAE) on the robustness of VAEs and suggest metrics...
Many real-life processes are black-box problems, i.e., the internal workings are inaccessible or a closed-form mathematical expression of the likelihood function cannot be defined. For continuous random variables, likelihood-free inference problems can be solved via Approximate Bayesian Computation (ABC). However, an optimal alternative for discret...
The challenge of robotic reproduction {making of new robots by recombiningtwo existing ones{ has been recently cracked and physically evolving robot systems have come within reach. Here we address the next big hurdle: producing an adequate brain for a newborn robot. In particular, we address the task of targeted locomotion which is arguably a funda...
An efficient treatment against a COVID-19 disease, caused by the novel coronavirus SARS-CoV-2 (CoV2), remains a challenge. The papain-like protease (PL pro ) from the human coronavirus is a protease that plays a critical role in virus replication. Moreover, CoV2 uses this enzyme to modulate the host’s immune system to its own benefit. Therefore, it...
We introduce Invertible Dense Networks (i-DenseNets), a more parameter efficient alternative to Residual Flows. The method relies on an analysis of the Lipschitz continuity of the concatenation in DenseNets, where we enforce invertibility of the network by satisfying the Lipschitz constant. We extend this method by proposing a learnable concatenati...
Conventional neural architectures for sequential data present important limitations. Recurrent networks suffer from exploding and vanishing gradients, small effective memory horizons, and must be trained sequentially. Convolutional networks are unable to handle sequences of unknown size and their memory horizon must be defined a priori. In this wor...
Cancer cell metabolism is dependent on cell-intrinsic factors, such as genetics, and cell-extrinsic factors, such nutrient availability. In this context, understanding how these two aspects interact and how diet influences cellular metabolism is important for developing personalized treatment. In order to achieve this goal, genome-scale metabolic m...
Motivation
The gut microbiota is the human body’s largest population of microorganisms that interact with human intestinal cells. They use ingested nutrients for fundamental biological processes and have important impacts on human physiology, immunity, and metabolome in the gastrointestinal tract.
Results
Here, we present M2R, a Python add-on to c...
One of the central elements in systems biology is the interaction between mathematical modeling and measured quantities. Typically, biological phenomena are represented as dynamical systems, and they are further analyzed and comprehended by identifying model parameters using experimental data. However, all model parameters cannot be found by gradie...
In this paper, we present a new class of invertible transformations. We indicate that many well-known invertible tranformations in reversible logic and reversible neural networks could be derived from our proposition. Next, we propose two new coupling layers that are important building blocks of flow-based generative models. In the preliminary expe...
The challenge of robotic reproduction -- making of new robots by recombining two existing ones -- has been recently cracked and physically evolving robot systems have come within reach. Here we address the next big hurdle: producing an adequate brain for a newborn robot. In particular, we address the task of targeted locomotion which is arguably a...
Many real-life problems are represented as a black-box, i.e., the internal workings are inaccessible or a closed-form mathematical expression of the likelihood function cannot be defined. For continuous random variables likelihood-free inference problems can be solved by a group of methods under the name of Approximate Bayesian Computation (ABC). H...
Density estimation, compression and data generation are crucial tasks in artificial intelligence. Variational Auto-Encoders (VAEs) constitute a single framework to achieve these goals. Here, we present a novel class of generative models, called self-supervised Variational Auto-Encoder (selfVAE), that utilizes deterministic and discrete variational...
We introduce Invertible Dense Networks (i-DenseNets), a more parameter efficient alternative to Residual Flows. The method relies on an analysis of the Lipschitz continuity of the concatenation in DenseNets, where we enforce the invertibility of the network by satisfying the Lipschitz constraint. Additionally, we extend this method by proposing a l...
Models in systems biology are mathematical descriptions of biological processes that are used to answer questions and gain a better understanding of biological phenomena. Dynamic models represent the network through rates of the production and consumption for the individual species. The ordinary differential equations that describe rates of the rea...
Cancer cell metabolism is dependent on cell-intrinsic factors like genetics, and cell-extrinsic factors like nutrient availability. In this context, understanding how these two aspects interact and how diet influences cellular metabolism is important for developing personalized treatment. In order to achieve this goal, genome-scale metabolic models...
The framework of variational autoencoders (VAEs) provides a principled method for jointly learning latent-variable models and corresponding inference models. However, the main drawback of this approach is the blurriness of the generated images. Some studies link this effect to the objective function, namely, the (negative) log-likelihood. Here, we...
Inducing symmetry equivariance in deep neural architectures has resolved into improved data efficiency and generalization. In this work, we utilize the concept of scale and translation equivariance to tackle the problem of learning on time-series from raw waveforms. As a result, we obtain representations that largely resemble those of the wavelet t...
This paper introduces a new method to build linear flows, by taking the exponential of a linear transformation. This linear transformation does not need to be invertible itself, and the exponential has the following desirable properties: it is guaranteed to be invertible, its inverse is straightforward to compute and the log Jacobian determinant is...
A collection of twelve organoselenium compounds, structural analogues of antioxidant drug ebselen were screened for inhibition of the papain-like protease (PL pro ) from the acute respiratory syndrome coronavirus 2 (SARS-CoV-2, CoV2). This cysteine protease, being responsible for the hydrolysis of peptide bonds between specific amino acids, plays a...
Since December 2019 a novel a coronavirus identified as SARS-CoV-2 or COV2 has been spreading around the world. On the 16th of May around 4.5 million people got infected and over 300,000 died due to the infection of COV2. The effective treatment remains a challenge. Targeted therapeutics are still under investigation. The papain-like protease (PL P...
Machine learning models trained with purely observational data and the principle of empirical risk minimization (Vapnik, 1992) can fail to generalize to unseen domains. In this paper, we focus on the case where the problem arises through spurious correlation between the observed domains and the actual task labels. We find that many domain generaliz...
Not all generate-and-test search algorithms are created equal. Bayesian Optimization (BO) invests a lot of computation time to generate the candidate solution that best balances the predicted value and the uncertainty given all previous data, taking increasingly more time as the number of evaluations performed grows. Evolutionary Algorithms (EA) on...
Convolutional Neural Networks experience catastrophic forgetting when optimized on a sequence of learning problems: as they meet the objective of the current training examples, their performance on previous tasks drops drastically. In this work, we introduce a novel framework to tackle this problem with conditional computation. We equip each convol...
Differential evolution (DE) is a well-known type of evolutionary algorithms (EA). Similarly to other EA variants it can suffer from small populations and loose diversity too quickly. This paper presents a new approach to mitigate this issue: We propose to generate new candidate solutions by utilizing reversible linear transformation applied to a tr...
Although group convolutional networks are able to learn powerful representations based on symmetry patterns, they lack explicit means to learn meaningful relationships among them (e.g., relative positions and poses). In this paper, we present attentive group equivariant convolutions, a generalization of the group convolution, in which attention is...
Media is generally stored digitally and is therefore discrete. Many successful deep distribution models in deep learning learn a density, i.e., the distribution of a continuous random variable. Na\"ive optimization on discrete data leads to arbitrarily high likelihoods, and instead, it has become standard practice to add noise to datapoints. In thi...
We generalize the well-studied problem of gait learning in modular robots in two dimensions. Firstly, we address locomotion in a given target direction that goes beyond learning a typical undirected gait. Secondly, rather than studying one fixed robot morphology we consider a test suite of different modular robots. This study is based on our intere...
This paper presents a new general framework for turning any auto-encoder into a generative model. Here, we focus on a specific instantiation of the auto-encoder that consists of the Short Time Fourier Transform as an encoder, and a composition of the Griffin-Lim Algorithm and the pseudo inverse of the Short Time Fourier Transform as a decoder. In o...
Learning suitable latent representations for observed, high-dimensional data is an important research topic underlying many recent advances in machine learning. While traditionally the Gaussian normal distribution has been the go-to latent parameterization, recently a variety of works have successfully proposed the use of manifold-valued latents. I...
This paper introduces a new approach to maximum likelihood learning of the parameters of a restricted Boltzmann machine (RBM). The proposed method is based on the Perturb-and-MAP (PM) paradigm that enables sampling from the Gibbs distribution. PM is a two step process: (i) perturb the model using Gumbel perturbations, then (ii) find the maximum a p...
In this paper we present a a deep generative model for lossy video compression. We employ a model that consists of a 3D autoencoder with a discrete latent space and an autoregressive prior used for entropy coding. Both autoencoder and prior are trained jointly to minimize a rate-distortion loss, which is closely related to the ELBO used in variatio...
The Michaelis‐Menten equation is one of the most extensively used models in biochemistry for studying enzyme kinetics. However, this model requires at least a couple (e.g., eight or more) of measurements at different substrate concentrations to determine kinetic parameters. Here, we report the discovery of a novel tool for calculating kinetic const...
We consider the problem of domain generalization, namely, how to learn representations given data from a set of domains that generalize to data from a previously unseen domain. We propose the Domain Invariant Variational Autoencoder (DIVA), a generative model that tackles this problem by learning three independent latent subspaces, one for the doma...
Optimizing the execution time of tensor program, e.g., a convolution, involves finding its optimal configuration. Searching the configuration space exhaustively is typically infeasible in practice. In line with recent research using TVM, we propose to learn a surrogate model to overcome this issue. The model is trained on an acyclic graph called an...
This paper focuses on Bayesian Optimization - typically considered with continuous inputs - for discrete search input spaces, including integer, categorical or graph structured input variables. In Gaussian process-based Bayesian Optimization a problem arises, as it is not straightforward to define a proper kernel on discrete input structures, where...
De novo designed helix-loop-helix peptide foldamers containing cis-2-aminocyclopentanecarboxylic acid residues were evaluated for their conformational stability and possible use in enzyme mimetic development. The correlation between hydrogen bond network size and conformational stability was demonstrated through CD and NMR spectroscopies. Molecules...
Decision making is a process that is extremely prone to different biases. In this paper we consider learning fair representation that aim at removing nuisance (sensitive) information from the decision process. For this purpose, we propose to use deep generative modeling and adapt a hierarchical Variational Auto-Encoder to learn fair representations...
The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hy...