Noah Hollmann’s research while affiliated with University of Freiburg and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (10)


Overview of the proposed method
a, The high-level overview of TabPFN pre-training and usage. b, The TabPFN architecture. We train a model to solve more than 100 million synthetic tasks. Our architecture is an adaptation of the standard transformer encoder that is adapted for the two-dimensional data encountered in tables.
Overview of the TabPFN prior
a, For each dataset, we first sample high-level hyperparameters. b, Based on these hyperparameters, we construct a structural causal model that encodes the computational function generating the dataset. Each node holds a vector and each edge in the computational graph implements a function according to one of the connection types. In step 1, using random noise variables we generate initialization data, which is fed into the root nodes of the graphs and propagated through the computational graph for each to-be-generated sample. In step 2, we randomly sample feature and target node positions in the graph, labelled F and T, respectively. In step 3, we extract the intermediate data representations at the sampled feature and target node positions. In step 4, we post-process the extracted data. c, We retrieve the final datasets. We plot interactions of feature pairs and the node colour represents the class of the sample.
The behaviour of TabPFN and a set of baselines on simple functions
In all plots, we use orange for the ground truth and blue for model predictions. a, Each column represents a different toy function, each having a single feature (along the x-axis) and a target (along the y-axis). TabPFN can model a lot of different functions, including noisy functions. b, TabPFN can model distributions over outputs out of the box, which is exemplified by predicting the light intensity pattern in a double-slit experiment after observing the positions of 1,000 photons.
Comparison of TabPFN on our test benchmarks, containing datasets with up to 10,000 samples and 500 features
Performance was normalized per dataset before aggregation using all baselines; intervals represent the 95% confidence interval. Wilcoxon P refers to the two-sided Wilcoxon signed-rank test P value⁵⁴. a, Average performance of the default as well as the tuned versions of TabPFN and our baselines. All methods are tuned for ROC AUC or RMSE, respectively, thus decreasing the representativeness of the secondary metrics. LGBM, LightGBM; MLP, multilayer perceptron; SVM, support vector machines; RF, random forest; CB, CatBoost; XGB, XGBoost; Lin, logistic regression for classification and ridge regression for regression tasks. Plots on the right-hand side show a magnified analysis of the strongest baselines considered. b, A per-dataset comparison of TabPFN with its strongest baseline, CatBoost. Each dot is the average score on one dataset. c, The impact of hyperparameter tuning for the considered methods. The x-axis shows the average time required to fit and predict with the algorithm.
Robustness across datasets and performance comparison with tuned ensembles
a, A comparison of modified datasets. We can see that TabPFN is not more vulnerable to the modifications compared with baselines. We also see that TabPFN reproduces the accuracy of CatBoost (default) with only half the training samples provided. Here we normalize scores per dataset (sharing one normalization across all modifications of one experiment) to avoid negative outliers. b, We split the test datasets by data characteristics and analyse the performance per subgroup. c, Classification performance. Left, the win rate of TabPFN (PHE) against AutoGluon (with one tie excluded); right, the ROC AUC score over time for tuning each method, with the first marker representing the default configuration for the non-ensembling methods. d, Regression performance presented as in c but using the RMSE metric. Intervals represent the 95% confidence interval and Wilcoxon P refers to the two-sided Wilcoxon signed-rank test P value⁵⁴.

+1

Accurate predictions on small data with a tabular foundation model
  • Article
  • Full-text available

January 2025

·

119 Reads

·

5 Citations

Nature

Noah Hollmann

·

Samuel Müller

·

·

[...]

·

Tabular data, spreadsheets organized in rows and columns, are ubiquitous across scientific fields, from biomedicine to particle physics to economics and climate science1,2. The fundamental prediction task of filling in missing values of a label column based on the rest of the columns is essential for various applications as diverse as biomedical risk models, drug discovery and materials science. Although deep learning has revolutionized learning from raw data and led to numerous high-profile success stories3, 4–5, gradient-boosted decision trees6, 7, 8–9 have dominated tabular data for the past 20 years. Here we present the Tabular Prior-data Fitted Network (TabPFN), a tabular foundation model that outperforms all previous methods on datasets with up to 10,000 samples by a wide margin, using substantially less training time. In 2.8 s, TabPFN outperforms an ensemble of the strongest baselines tuned for 4 h in a classification setting. As a generative transformer-based foundation model, this model also allows fine-tuning, data generation, density estimation and learning reusable embeddings. TabPFN is a learning algorithm that is itself learned across millions of synthetic datasets, demonstrating the power of this approach for algorithm development. By improving modelling abilities across diverse fields, TabPFN has the potential to accelerate scientific discovery and enhance important decision-making in various domains.

Download

Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data

November 2024

·

2 Reads

While most ML models expect independent and identically distributed data, this assumption is often violated in real-world scenarios due to distribution shifts, resulting in the degradation of machine learning model performance. Until now, no tabular method has consistently outperformed classical supervised learning, which ignores these shifts. To address temporal distribution shifts, we present Drift-Resilient TabPFN, a fresh approach based on In-Context Learning with a Prior-Data Fitted Network that learns the learning algorithm itself: it accepts the entire training dataset as input and makes predictions on the test set in a single forward pass. Specifically, it learns to approximate Bayesian inference on synthetic datasets drawn from a prior that specifies the model's inductive bias. This prior is based on structural causal models (SCM), which gradually shift over time. To model shifts of these causal models, we use a secondary SCM, that specifies changes in the primary model parameters. The resulting Drift-Resilient TabPFN can be applied to unseen data, runs in seconds on small to moderately sized datasets and needs no hyperparameter tuning. Comprehensive evaluations across 18 synthetic and real-world datasets demonstrate large performance improvements over a wide range of baselines, such as XGB, CatBoost, TabPFN, and applicable methods featured in the Wild-Time benchmark. Compared to the strongest baselines, it improves accuracy from 0.688 to 0.744 and ROC AUC from 0.786 to 0.832 while maintaining stronger calibration. This approach could serve as significant groundwork for further research on out-of-distribution prediction.


Bayes' Power for Explaining In-Context Learning Generalizations

October 2024

·

42 Reads

Traditionally, neural network training has been primarily viewed as an approximation of maximum likelihood estimation (MLE). This interpretation originated in a time when training for multiple epochs on small datasets was common and performance was data bound; but it falls short in the era of large-scale single-epoch trainings ushered in by large self-supervised setups, like language models. In this new setup, performance is compute-bound, but data is readily available. As models became more powerful, in-context learning (ICL), i.e., learning in a single forward-pass based on the context, emerged as one of the dominant paradigms. In this paper, we argue that a more useful interpretation of neural network behavior in this era is as an approximation of the true posterior, as defined by the data-generating process. We demonstrate this interpretations' power for ICL and its usefulness to predict generalizations to previously unseen tasks. We show how models become robust in-context learners by effectively composing knowledge from their training data. We illustrate this with experiments that reveal surprising generalizations, all explicable through the exact posterior. Finally, we show the inherent constraints of the generalization capabilities of posteriors and the limitations of neural networks in approximating these posteriors.


FairPFN: Transformers Can do Counterfactual Fairness

July 2024

·

20 Reads

Machine Learning systems are increasingly prevalent across healthcare, law enforcement, and finance but often operate on historical data, which may carry biases against certain demographic groups. Causal and counterfactual fairness provides an intuitive way to define fairness that closely aligns with legal standards. Despite its theoretical benefits, counterfactual fairness comes with several practical limitations, largely related to the reliance on domain knowledge and approximate causal discovery techniques in constructing a causal model. In this study, we take a fresh perspective on counterfactually fair prediction, building upon recent work in in context learning (ICL) and prior fitted networks (PFNs) to learn a transformer called FairPFN. This model is pretrained using synthetic fairness data to eliminate the causal effects of protected attributes directly from observational data, removing the requirement of access to the correct causal model in practice. In our experiments, we thoroughly assess the effectiveness of FairPFN in eliminating the causal impact of protected attributes on a series of synthetic case studies and real world datasets. Our findings pave the way for a new and promising research area: transformers for causal and counterfactual fairness.


Figure 1: Our proposed Prior-data Fitted Network almost exactly approximates a Gaussian Process posterior with fixed hyperparameters. We plot the exact and approximated GP prediction for (i) the mean and (ii) expected improvement. For the simple GP model approximated here, a ground truth can be exactly calculated, which is generally not the case, see Section 4.1. PFNs, however, can be extended to approximate any prior one can sample from.
Figure 6: A visualisation of the Riemann distribution, with unbounded support. Plot based on (Müller et al., 2021)
Figure 11: In this figure, we compare models trained on a simple GP prior (with fixed hyper-parameters), thus we can compare to the exact posterior of the GP. We show how PFNs behave differently depending on how much they were trained. Vertical lines mark the maximum of the acquisition function.
PFNs Are Flexible Models for Real-World Bayesian Optimization

May 2023

·

127 Reads

·

1 Citation

In this paper, we use Prior-data Fitted Networks (PFNs) as a flexible surrogate for Bayesian Optimization (BO). PFNs are neural processes that are trained to approximate the posterior predictive distribution (PPD) for any prior distribution that can be efficiently sampled from. We describe how this flexibility can be exploited for surrogate modeling in BO. We use PFNs to mimic a naive Gaussian process (GP), an advanced GP, and a Bayesian Neural Network (BNN). In addition, we show how to incorporate further information into the prior, such as allowing hints about the position of optima (user priors), ignoring irrelevant dimensions, and performing non-myopic BO by learning the acquisition function. The flexibility underlying these extensions opens up vast possibilities for using PFNs for BO. We demonstrate the usefulness of PFNs for BO in a large-scale evaluation on artificial GP samples and three different hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1. We publish code alongside trained models at http://github.com/automl/PFNs4BO.


GPT for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering

May 2023

·

21 Reads

As the field of automated machine learning (AutoML) advances, it becomes increasingly important to include domain knowledge within these systems. We present an approach for doing so by harnessing the power of large language models (LLMs). Specifically, we introduce Context-Aware Automated Feature Engineering (CAAFE), a feature engineering method for tabular datasets that utilizes an LLM to generate additional semantically meaningful features for tabular datasets based on their descriptions. The method produces both Python code for creating new features and explanations for the utility of the generated features. Despite being methodologically simple, CAAFE enhances performance on 11 out of 14 datasets, ties on 2 and looses on 1 - boosting mean ROC AUC performance from 0.798 to 0.822 across all datasets. On the evaluated datasets, this improvement is similar to the average improvement achieved by using a random forest (AUC 0.782) instead of logistic regression (AUC 0.754). Furthermore, our method offers valuable insights into the rationale behind the generated features by providing a textual explanation for each generated feature. CAAFE paves the way for more extensive (semi-)automation in data science tasks and emphasizes the significance of context-aware solutions that can extend the scope of AutoML systems. For reproducability, we release our code and a simple demo.


Study overview
a, To learn metabolomic states from circulating blood metabolites, the eligible UK Biobank population (with NMR blood metabolomics and valid consent) was split into training, validation and test sets with 22-fold nested cross-validation based on the assigned UK Biobank assessment center. b, For each of the 22 partitions, the metabolomic state model was trained on the 168 metabolomic markers to predict metabolomic risk against 24 common disease endpoints. Subsequently, for each endpoint, CPH models were developed on the metabolomic state in combination with sets of commonly available clinical predictors to model disease risk. Predictions of the CPH model on the test set were aggregated for downstream analysis. c, The metabolomic state model was externally validated in four independent cohorts—the Whitehall II cohort and three from the BBMRI-NL consortium: the Rotterdam Study, the Leiden Longevity Study and the PROSPER cohort. d, In this study we consider clinical predictors from scores commonly applied in primary prevention. We additionally integrate variables into a comprehensive predictor set (PANEL) to investigate overlapping information with the metabolomic state. FH, family history.
Metabolomic state is associated with ORs and stratifies survival
a, Observed event frequency for incident disease plotted against metabolomic state percentiles over the entire study population for all 24 endpoints. b, Cumulative event rates over the observation time for all assessed endpoints, stratified by metabolomic state quantiles (light blue, bottom 10%; blue, median 10%; dark blue, top 10%), with 95% CIs indicated. PAD, peripheral artery disease.
Predictive value of the metabolomic state is endpoint dependent
a, Comparison of discriminative performance of CPH models trained on the metabolomic state only (MET), the three clinical predictor sets (Age+Sex, ASCVD and PANEL) and the sets’ combinations with the metabolomic state. Horizontal dashed lines indicate the median performance of the three clinical predictor sets. b, Differences in discriminative performance between the Age+Sex baseline (dashed line), metabolomic state only (blue) and the combination of Age+Sex and metabolomic state (green). c, Differences in discriminative performance between ASCVD predictors (dashed line), the combination of Age+Sex and the metabolomic state (green) and the combination of metabolomic state and ASCVD predictors (red). d, Difference in discriminative performance between comprehensive PANEL predictors (dashed line), ASCVD + MET (red) and PANEL + MET (black). a–d, Statistical measures were derived from n = 117.981 individuals; those with previous events were excluded (Supplementary Table 1). Data are presented as median (center of error bar) and 95% CI (line of error bar) determined by bootstrapping of with 1,000 iterations. b–d, The x-axis range differs across panels; vertical grid lines indicate differences of 0.02 C-index.
Model calibration and additive predictive value of the metabolomic state translate to potential clinical utility
a–c, Calibration curves for CPH models, including baseline parameter sets Age+Sex, ASCVD and PANEL, as well as their combinations with the metabolomic state (Age+Sex + MET) for the endpoints T2D (a), dementia (b) and heart failure (c). d–f, Endpoint-specific net benefit curves standardized by endpoint prevalence, where horizontal solid gray lines indicate ‘treat none’ and vertical solid gray lines indicate ‘treat all’; T2D (d), dementia (e) and heart failure (f). The standardized net benefits of sets Age+Sex, ASCVD and PANEL are compared with Age+Sex + MET and additional non-laboratory predictors of PANEL (PANELnoLaboratory). Green and blue color-filled areas indicate the added benefit of the combination of the metabolomic state and Age+Sex and PANELnoLaboratory, respectively. g–i, Standardized net benefit curves comparing the performance of PANEL + MET against baselines Age+Sex, ASCVD and PANEL; T2D (g), dementia (h) and heart failure (i). Decision curves were derived from n = 111,745 (T2D), n = 117,245 (dementia) and n = 113,636 (heart failure) individuals.
Analysis of the metabolomic state informs on metabolite profiles associated with disease risk
a, Heatmap showing the importance of metabolites in regard to the estimated metabolomic states, represented by absolute global SHAP value estimates per endpoint for the 75 globally most important metabolites. Endpoints are sorted by the discriminative performance of the metabolomic state (left to right; Fig. 3a). b, Global metabolite attributions for T2D; individual attributions are aggregated by percentiles and each dot indicates one percentile. The more distant a dot from the circular baseline, the stronger the absolute attribution for that percentile. Deviations toward the center and periphery represent negative and positive contributions, respectively, to the metabolomic state. Colors indicate the metabolite’s mean plasma value. c, Global metabolite attributions for all-cause dementia. IDL, intermediate-density lipoprotein.
Metabolomic profiles predict individual multidisease outcomes

September 2022

·

486 Reads

·

203 Citations

Nature Medicine

Risk stratification is critical for the early identification of high-risk individuals and disease prevention. Here we explored the potential of nuclear magnetic resonance (NMR) spectroscopy-derived metabolomic profiles to inform on multidisease risk beyond conventional clinical predictors for the onset of 24 common conditions, including metabolic, vascular, respiratory, musculoskeletal and neurological diseases and cancers. Specifically, we trained a neural network to learn disease-specific metabolomic states from 168 circulating metabolic markers measured in 117,981 participants with ~1.4 million person-years of follow-up from the UK Biobank and validated the model in four independent cohorts. We found metabolomic states to be associated with incident event rates in all the investigated conditions, except breast cancer. For 10-year outcome prediction for 15 endpoints, with and without established metabolic contribution, a combination of age and sex and the metabolomic state equaled or outperformed established predictors. Moreover, metabolomic state added predictive information over comprehensive clinical variables for eight common diseases, including type 2 diabetes, dementia and heart failure. Decision curve analyses showed that predictive improvements translated into clinical utility for a wide range of potential decision thresholds. Taken together, our study demonstrates both the potential and limitations of NMR-derived metabolomic profiles as a multidisease assay to inform on the risk of many common diseases simultaneously.


Meta-Learning a Real-Time Tabular AutoML Method For Small Data

July 2022

·

171 Reads

·

8 Citations

We present TabPFN, an AutoML method that is competitive with the state of the art on small tabular datasets while being over 1,000×\times faster. Our method is very simple: it is fully entailed in the weights of a single neural network, and a single forward pass directly yields predictions for a new dataset. Our AutoML method is meta-learned using the Transformer-based Prior-Data Fitted Network (PFN) architecture and approximates Bayesian inference with a prior that is based on assumptions of simplicity and causal structures. The prior contains a large space of structural causal models and Bayesian neural networks with a bias for small architectures and thus low complexity. Furthermore, we extend the PFN approach to differentiably calibrate the prior's hyperparameters on real data. By doing so, we separate our abstract prior assumptions from their heuristic calibration on real data. Afterwards, the calibrated hyperparameters are fixed and TabPFN can be applied to any new tabular dataset at the push of a button. Finally, on 30 datasets from the OpenML-CC18 suite we show that our method outperforms boosted trees and performs on par with complex state-of-the-art AutoML systems with predictions produced in less than a second. We provide all our code and our final trained TabPFN in the supplementary materials.


Figure 1: Selection and characteristics of study population (A) Individuals in the UK Biobank population who withdrew consent, with missing information about their sex or with earlier records of incident myocardial infarction or stroke or lipid-lowering treatment at baseline were excluded. The remaining set was split into training, validation, and test sets in 22-fold nested cross-validation based on the assigned UK Biobank assessment centre. (B) Distribution of observation times for the derived study population. The median observation time was 11·7 years (IQR 11·0-12·3). (C) Kaplan-Meier estimates for the disease-free survival function stratified by sex. (D) Numbers at risk in 5-year intervals stratified by sex.
appendix pp 11, 15). Although we observed improvements in discriminative performance for the Cox model after addition of the PGSs as well, the NeuralCVD model remained superior in C­index (COX plus PGS 0·002, 95% CI 0·002-0·003; COX plus PGS*age 0·002, 0·002-0·003) and NRI (COX plus PGS 0·0424, 95% CI 0·0383-0·0464; COX plus PGS*age 0·0359,
Neural network-based integration of polygenic and clinical information: development and validation of a prediction model for 10-year risk of major adverse cardiac events in the UK Biobank cohort

February 2022

·

255 Reads

·

33 Citations

The Lancet Digital Health

Background In primary cardiovascular disease prevention, early identification of high-risk individuals is crucial. Genetic information allows for the stratification of genetic predispositions and lifetime risk of cardiovascular disease. However, towards clinical application, the added value over clinical predictors later in life is crucial. Currently, this genotype–phenotype relationship and implications for overall cardiovascular risk are unclear. Methods In this study, we developed and validated a neural network-based risk model (NeuralCVD) integrating polygenic and clinical predictors in 395 713 cardiovascular disease-free participants from the UK Biobank cohort. The primary outcome was the first record of a major adverse cardiac event (MACE) within 10 years. We compared the NeuralCVD model with both established clinical scores (SCORE, ASCVD, and QRISK3 recalibrated to the UK Biobank cohort) and a linear Cox-Model, assessing risk discrimination, net reclassification, and calibration over 22 spatially distinct recruitment centres. Findings The NeuralCVD score was well calibrated and improved on the best clinical baseline, QRISK3 (ΔConcordance index [C-index] 0·01, 95% CI 0·009–0·011; net reclassification improvement (NRI) 0·0488, 95% CI 0·0442–0·0534) and a Cox model (ΔC-index 0·003, 95% CI 0·002–0·004; NRI 0·0469, 95% CI 0·0429–0·0511) in risk discrimination and net reclassification. After adding polygenic scores we found further improvements on population level (ΔC-index 0·006, 95% CI 0·005–0·007; NRI 0·0116, 95% CI 0·0066–0·0159). Additionally, we identified an interaction of genetic information with the pre-existing clinical phenotype, not captured by conventional models. Additional high polygenic risk increased overall risk most in individuals with low to intermediate clinical risk, and age younger than 50 years. Interpretation Our results demonstrated that the NeuralCVD score can estimate cardiovascular risk trajectories for primary prevention. NeuralCVD learns the transition of predictive information from genotype to phenotype and identifies individuals with high genetic predisposition before developing a severe clinical phenotype. This finding could improve the reprioritisation of otherwise low-risk individuals with a high genetic cardiovascular predisposition for preventive interventions. Funding Charité–Universitätsmedizin Berlin, Einstein Foundation Berlin, and the Medical Informatics Initiative.


Transformers Can Do Bayesian Inference

December 2021

·

41 Reads

·

2 Citations

Currently, it is hard to reap the benefits of deep learning for Bayesian methods, which allow the explicit specification of prior knowledge and accurately capture model uncertainty. We present Prior-Data Fitted Networks (PFNs). PFNs leverage large-scale machine learning techniques to approximate a large set of posteriors. The only requirement for PFNs to work is the ability to sample from a prior distribution over supervised learning tasks (or functions). Our method restates the objective of posterior approximation as a supervised classification problem with a set-valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, masks one of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points. Presented with a set of samples from a new supervised learning task as input, PFNs make probabilistic predictions for arbitrary other data points in a single forward propagation, having learned to approximate Bayesian inference. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems, with over 200-fold speedups in multiple setups compared to current methods. We obtain strong results in very diverse areas such as Gaussian process regression, Bayesian neural networks, classification for small tabular data sets, and few-shot image classification, demonstrating the generality of PFNs. Code and trained PFNs are released at https://github.com/automl/TransformersCanDoBayesianInference.

Citations (6)


... The first group focuses on developing specialized architectures specifically designed for tabular data. Notable examples within this category include NODE [9], TabNet [10], TabTransformer [11], and TabPFN [12,13]. The Neural Oblivious Decision Ensembles (NODE) [9] are characterized by a deep, layer-wise structure composed of an ensemble of differentiable oblivious trees. ...

Reference:

Tab2Visual: Overcoming Limited Data in Tabular Data Classification Using Deep Learning with Visual Representations
Accurate predictions on small data with a tabular foundation model

Nature

... This can be challenging due to the cost of training the model and the potential of unintended bias in the data. To avoid retraining, general-purpose GenAI-based recommenders could conceivably be trained on synthetic data (e.g., prior-data fitted networks [42]) or on previous optimization data. Regardless, harnessing GenAI's distributional learning capabilities to recommend new solutions to evaluate could significantly accelerate optimization. ...

PFNs Are Flexible Models for Real-World Bayesian Optimization

... Omics sciences represent a very promising instrument to perform the analysis of patients and their biological characteristics within the dynamic context of disease evolution, thus enabling the molecular characterization of a disease onset and evolution, and providing insight into individual susceptibility to drug treatments [5][6][7][8][9][10]. Given these premises, metabolomics and lipoproteomics present themselves as compelling approaches for investigating alterations of multiple biochemical networks throughout the entire course of AD [11][12][13][14][15][16][17][18]. ...

Metabolomic profiles predict individual multidisease outcomes

Nature Medicine

... TabPFN, a modified prior-data fitted network architecture, was used to develop the prediction models [21,22]. Prior-data fitted networks, including TabPFN, are pretrained on synthetic data to emulate Bayesian inference on real-world information. ...

Meta-Learning a Real-Time Tabular AutoML Method For Small Data
  • Citing Preprint
  • July 2022

... Importantly, our approach, based on routine health records, shows large discriminative improvements for the majority of diseases compared with conventionally tested biomarkers [55][56][57] and can be generalized across diverse health systems, populations, and ethnicities. However, we also see that including the medical history over age and sex deteriorated the performance for a subset of 0.7% (UK Biobank) and 5.5% (All Of Us cohort), respectively. ...

Neural network-based integration of polygenic and clinical information: development and validation of a prediction model for 10-year risk of major adverse cardiac events in the UK Biobank cohort

The Lancet Digital Health

... PFNs challenge traditional model training by embracing a vast knowledge base derived from simulated datasets, enabling rapid and insightful inference on new data. Key concepts of PFNs include: Prior-Data Fitted Networks (PFNs) 19 challenge the traditional model training paradigm with a Bayesian-inspired approach. Instead of relying on a single training dataset, PFNs leverage knowledge from a vast collection of simulated datasets. ...

Transformers Can Do Bayesian Inference
  • Citing Preprint
  • December 2021