Mihaela van der Schaar’s research while affiliated with University of Cambridge and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (889)


No Equations Needed: Learning System Dynamics Without Relying on Closed-Form ODEs
  • Preprint

January 2025

Krzysztof Kacprzyk

·

Mihaela van der Schaar

Data-driven modeling of dynamical systems is a crucial area of machine learning. In many scenarios, a thorough understanding of the model's behavior becomes essential for practical applications. For instance, understanding the behavior of a pharmacokinetic model, constructed as part of drug development, may allow us to both verify its biological plausibility (e.g., the drug concentration curve is non-negative and decays to zero) and to design dosing guidelines. Discovery of closed-form ordinary differential equations (ODEs) can be employed to obtain such insights by finding a compact mathematical equation and then analyzing it (a two-step approach). However, its widespread use is currently hindered because the analysis process may be time-consuming, requiring substantial mathematical expertise, or even impossible if the equation is too complex. Moreover, if the found equation's behavior does not satisfy the requirements, editing it or influencing the discovery algorithms to rectify it is challenging as the link between the symbolic form of an ODE and its behavior can be elusive. This paper proposes a conceptual shift to modeling low-dimensional dynamical systems by departing from the traditional two-step modeling process. Instead of first discovering a closed-form equation and then analyzing it, our approach, direct semantic modeling, predicts the semantic representation of the dynamical system (i.e., description of its behavior) directly from data, bypassing the need for complex post-hoc analysis. This direct approach also allows the incorporation of intuitive inductive biases into the optimization algorithm and editing the model's behavior directly, ensuring that the model meets the desired specifications. Our approach not only simplifies the modeling pipeline but also enhances the transparency and flexibility of the resulting models compared to traditional closed-form ODEs.


Methods and clinical applications for eXplainable artificial intelligence in surgery
Can surgeons trust AI? Perspectives on machine learning in surgery and the importance of eXplainable Artificial Intelligence (XAI)
  • Literature Review
  • Full-text available

January 2025

·

9 Reads

Langenbeck's Archives of Surgery

Purpose This brief report aims to summarize and discuss the methodologies of eXplainable Artificial Intelligence (XAI) and their potential applications in surgery. Methods We briefly introduce explainability methods, including global and individual explanatory features, methods for imaging data and time series, as well as similarity classification, and unraveled rules and laws. Results Given the increasing interest in artificial intelligence within the surgical field, we emphasize the critical importance of transparency and interpretability in the outputs of applied models. Conclusion Transparency and interpretability are essential for the effective integration of AI models into clinical practice.

Download

Figure 2: Addressing real data challenges is complex and requires multi-step reasoning.
Figure 4: Challenges of Monte Carlo Tree Search (MCTS). We highlight two key drawbacks of MCTS. First, prediction performance cannot serve as a reliable reward, as it may favor data issues such as label leakage or meaningless problem setups (middle). Second, MCTS suffers from low efficiency, requiring experts to endure long waiting times and evaluate a large number of trials (right). In contrast, CliMB-DC's proposed reasoning approach enables immediate backtracking and replanning, significantly enhancing efficiency.
Figure 8: The processing workflow of CliMB-DC on the PBC dataset, illustrating how the coordinator agent, worker agent, and user interact at each processing stage T . The data issues include multiple measurements, missingness, and label leakage. The prediction task is survival analysis, requiring domain-specific model classes.
Figure 9: The processing workflow of CliMB-DC on the Lung Cancer dataset, illustrating how the coordinator agent, worker agent, and user interact at each processing stage T . The data issues include missingness, feature redundancy, and label leakage. The prediction task is survival analysis, requiring domain-specific model classes.
Taxonomy of key data-centric challenges frequently encountered in healthcare machine learning pipelines. While not exhaustive, these categories represent a significant fraction of issues that co-pilots must address to ensure strong predictive performance, robustness, fairness, and clinical feasibility.
Towards Human-Guided, Data-Centric LLM Co-Pilots

January 2025

·

3 Reads

Evgeny Saveliev

·

Jiashuo Liu

·

·

[...]

·

Mihaela van der Schaar

Machine learning (ML) has the potential to revolutionize healthcare, but its adoption is often hindered by the disconnect between the needs of domain experts and translating these needs into robust and valid ML tools. Despite recent advances in LLM-based co-pilots to democratize ML for non-technical domain experts, these systems remain predominantly focused on model-centric aspects while overlooking critical data-centric challenges. This limitation is problematic in complex real-world settings where raw data often contains complex issues, such as missing values, label noise, and domain-specific nuances requiring tailored handling. To address this we introduce CliMB-DC, a human-guided, data-centric framework for LLM co-pilots that combines advanced data-centric tools with LLM-driven reasoning to enable robust, context-aware data processing. At its core, CliMB-DC introduces a novel, multi-agent reasoning system that combines a strategic coordinator for dynamic planning and adaptation with a specialized worker agent for precise execution. Domain expertise is then systematically incorporated to guide the reasoning process using a human-in-the-loop approach. To guide development, we formalize a taxonomy of key data-centric challenges that co-pilots must address. Thereafter, to address the dimensions of the taxonomy, we integrate state-of-the-art data-centric tools into an extensible, open-source architecture, facilitating the addition of new tools from the research community. Empirically, using real-world healthcare datasets we demonstrate CliMB-DC's ability to transform uncurated datasets into ML-ready formats, significantly outperforming existing co-pilot baselines for handling data-centric challenges. CliMB-DC promises to empower domain experts from diverse domains -- healthcare, finance, social sciences and more -- to actively participate in driving real-world impact using ML.


Automated Ensemble Multimodal Machine Learning for Healthcare

January 2025

·

6 Reads

IEEE Journal of Biomedical and Health Informatics

The application of machine learning in medicine and healthcare has led to the creation of numerous diagnostic and prognostic models. However, despite their success, current approaches generally issue predictions using data from a single modality. This stands in stark contrast with clinician decision-making which employs diverse information from multiple sources. While several multimodal machine learning approaches exist, significant challenges in developing multimodal systems remain that are hindering clinical adoption. In this paper, we introduce a multimodal framework, AutoPrognosis-M, that enables the integration of structured clinical (tabular) data and medical imaging using automated machine learning. AutoPrognosis-M incorporates 17 imaging models, including convolutional neural networks and vision transformers, and three distinct multimodal fusion strategies. In an illustrative application using a multimodal skin lesion dataset, we highlight the importance of multimodal machine learning and the power of combining multiple fusion strategies using ensemble learning. We have open-sourced our framework as a tool for the community and hope it will accelerate the uptake of multimodal machine learning in healthcare and spur further innovation.


Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes

December 2024

·

2 Reads

As large language models (LLMs) become increasingly embedded in everyday applications, ensuring their alignment with the diverse preferences of individual users has become a critical challenge. Currently deployed approaches typically assume homogeneous user objectives and rely on single-objective fine-tuning. However, human preferences are inherently heterogeneous, influenced by various unobservable factors, leading to conflicting signals in preference data. Existing solutions addressing this diversity often require costly datasets labelled for specific objectives and involve training multiple reward models or LLM policies, which is computationally expensive and impractical. In this work, we present a novel framework for few-shot steerable alignment, where users' underlying preferences are inferred from a small sample of their choices. To achieve this, we extend the Bradley-Terry-Luce model to handle heterogeneous preferences with unobserved variability factors and propose its practical implementation for reward modelling and LLM fine-tuning. Thanks to our proposed approach of functional parameter-space conditioning, LLMs trained with our framework can be adapted to individual preferences at inference time, generating outputs over a continuum of behavioural modes. We empirically validate the effectiveness of methods, demonstrating their ability to capture and align with diverse human preferences in a data-efficient manner. Our code is made available at: https://github.com/kasia-kobalczyk/few-shot-steerable-alignment.


LLMs for Generalizable Language-Conditioned Policy Learning under Minimal Data Requirements

December 2024

·

6 Reads

To develop autonomous agents capable of executing complex, multi-step decision-making tasks as specified by humans in natural language, existing reinforcement learning approaches typically require expensive labeled datasets or access to real-time experimentation. Moreover, conventional methods often face difficulties in generalizing to unseen goals and states, thereby limiting their practical applicability. This paper presents TEDUO, a novel training pipeline for offline language-conditioned policy learning. TEDUO operates on easy-to-obtain, unlabeled datasets and is suited for the so-called in-the-wild evaluation, wherein the agent encounters previously unseen goals and states. To address the challenges posed by such data and evaluation settings, our method leverages the prior knowledge and instruction-following capabilities of large language models (LLMs) to enhance the fidelity of pre-collected offline data and enable flexible generalization to new goals and states. Empirical results demonstrate that the dual role of LLMs in our framework-as data enhancers and generalizers-facilitates both effective and data-efficient learning of generalizable language-conditioned policies.


AI education for clinicians

EClinicalMedicine

Rapid advancements in medical AI necessitate targeted educational initiatives for clinicians to ensure AI tools are safe and used effectively to improve patient outcomes. To support decision-making among stakeholders in medical education, we propose three tiers of medical AI expertise and outline the challenges for medical education at different educational stages. Additionally, we offer recommendations and examples, encouraging stakeholders to adapt and shape curricula for their specific healthcare setting using this framework.


Quantifying perturbation impacts for large language models

December 2024

·

9 Reads

We consider the problem of quantifying how an input perturbation impacts the outputs of large language models (LLMs), a fundamental task for model reliability and post-hoc interpretability. A key obstacle in this domain is disentangling the meaningful changes in model responses from the intrinsic stochasticity of LLM outputs. To overcome this, we introduce Distribution-Based Perturbation Analysis (DBPA), a framework that reformulates LLM perturbation analysis as a frequentist hypothesis testing problem. DBPA constructs empirical null and alternative output distributions within a low-dimensional semantic similarity space via Monte Carlo sampling. Comparisons of Monte Carlo estimates in the reduced dimensionality space enables tractable frequentist inference without relying on restrictive distributional assumptions. The framework is model-agnostic, supports the evaluation of arbitrary input perturbations on any black-box LLM, yields interpretable p-values, supports multiple perturbation testing via controlled error rates, and provides scalar effect sizes for any chosen similarity or distance metric. We demonstrate the effectiveness of DBPA in evaluating perturbation impacts, showing its versatility for perturbation analysis.


Figure 3. Simulating ATC coding performance for various human deferral percentages for ALIGN (GPT-4o-mini backbone) for both (Panel A) SLE and (Panel B) RA datasets. We see that human-in-the-loop deferral of code predictions leads to significant performance improvements ---especially on uncommon codes. Our selection mechanism using entropy for deferral outperforms random deferral highlighting the capability to capture uncertainty correctly (i.e. pinpoint those instances which would most benefit from human deferral).
MedDRA code prediction accuracy (mean±std) for the three levels of the MedDRA hierarchy: System Organ Class (SOC), Higher Level Group Term (HLGT) and Higher Level Term (HLT) using GPT-4o- mini as the backbone LLM. RAG-based baselines and ALIGN outperform vanilla LLM prompting approaches and they both have relatively similar performance across the three MedDRA levels given the mapping of query-code is largely semantic.
Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding

November 2024

·

13 Reads

The reuse of historical clinical trial data has significant potential to accelerate medical research and drug development. However, interoperability challenges, particularly with missing medical codes, hinders effective data integration across studies. While Large Language Models (LLMs) offer a promising solution for automated coding without labeled data, current approaches face challenges on complex coding tasks. We introduce ALIGN, a novel compositional LLM-based system for automated, zero-shot medical coding. ALIGN follows a three-step process: (1) diverse candidate code generation; (2) self-evaluation of codes and (3) confidence scoring and uncertainty estimation enabling human deferral to ensure reliability. We evaluate ALIGN on harmonizing medication terms into Anatomical Therapeutic Chemical (ATC) and medical history terms into Medical Dictionary for Regulatory Activities (MedDRA) codes extracted from 22 immunology trials. ALIGN outperformed the LLM baselines, while also providing capabilities for trustworthy deployment. For MedDRA coding, ALIGN achieved high accuracy across all levels, matching RAG and excelling at the most specific levels (87-90% for HLGT). For ATC coding, ALIGN demonstrated superior performance, particularly at lower hierarchy levels (ATC Level 4), with 72-73% overall accuracy and 86-89% accuracy for common medications, outperforming baselines by 7-22%. ALIGN's uncertainty-based deferral improved accuracy by 17% to 90% accuracy with 30% deferral, notably enhancing performance on uncommon medications. ALIGN achieves this cost-efficiently at \0.0007 and \0.02 per code for GPT-4o-mini and GPT-4o, reducing barriers to clinical adoption. ALIGN advances automated medical coding for clinical trial data, contributing to enhanced data interoperability and reusability, positioning it as a promising tool to improve clinical research and accelerate drug development.


Quantifying Aleatoric Uncertainty of the Treatment Effect: A Novel Orthogonal Learner

November 2024

·

5 Reads

Estimating causal quantities from observational data is crucial for understanding the safety and effectiveness of medical treatments. However, to make reliable inferences, medical practitioners require not only estimating averaged causal quantities, such as the conditional average treatment effect, but also understanding the randomness of the treatment effect as a random variable. This randomness is referred to as aleatoric uncertainty and is necessary for understanding the probability of benefit from treatment or quantiles of the treatment effect. Yet, the aleatoric uncertainty of the treatment effect has received surprisingly little attention in the causal machine learning community. To fill this gap, we aim to quantify the aleatoric uncertainty of the treatment effect at the covariate-conditional level, namely, the conditional distribution of the treatment effect (CDTE). Unlike average causal quantities, the CDTE is not point identifiable without strong additional assumptions. As a remedy, we employ partial identification to obtain sharp bounds on the CDTE and thereby quantify the aleatoric uncertainty of the treatment effect. We then develop a novel, orthogonal learner for the bounds on the CDTE, which we call AU-learner. We further show that our AU-learner has several strengths in that it satisfies Neyman-orthogonality and is doubly robust. Finally, we propose a fully-parametric deep learning instantiation of our AU-learner.


Citations (36)


... Thus the method needs to go through privacy and utility evaluations and we need to develop more specialized DP-based video face anonymization methods. In synthetic data generation DP-based methods for theoretical guarantees have been explored for text 99,100 , tabular data 119,120 , multimodal tabular and 3D image data 121 generation and with FL 122 . However, no DP-based methods have been developed for multimodal therapy session generation. ...

Reference:

Towards Privacy-aware Mental Health AI Models: Advances, Challenges, and Opportunities
Synthetic data for privacy-preserving clinical risk prediction

... Because the desirable properties chosen are easier to comprehend and explain, this increases the model's efficiency and reduces its complexity. Feature selection is required to address the issues of Occam's Razor [28], the curse of dimensionality-overfitting [29], and the challenge of removing unwanted data. ...

Unveiling the Power of Sparse Neural Networks for Feature Selection
  • Citing Chapter
  • October 2024

... With the rapid technological advances in predictive AI, the increasing ease of its use and the growing availability of large datasets, we invite entrepreneurship scholars to use their creativity and entrepreneurial spirit by adopting predictive AI methods by actively engaging in the 'data game' and discovering, and harnessing, the vast amounts of data that already exist on, among other things, nascent and new entrepreneurs, their startups, entrepreneurial ecosystems, their culture, and their policies (Grégoire et al., 2024). This may also involve the process of generating synthetic data (van Breugel et al., 2024). ...

Synthetic data in biomedicine via generative artificial intelligence
  • Citing Article
  • October 2024

Nature Reviews Bioengineering

... Developing autoML systems that can utilize longitudinal datasets to track changes and predict disease progression is essential [43]. Such systems would be valuable for monitoring chronic conditions like periodontal disease or assessing post-surgical healing in implant cases. ...

Predicting rapid progression in knee osteoarthritis: a novel and interpretable automated machine learning approach, with specific focus on young patients and early disease

Annals of the Rheumatic Diseases

... This alignment can improve also explainability by, for example, forcing the model to be never too far from a full-knowledge model [17,53]. Techniques include using machine learning predictions as inputs to physical models for more controlled outcomes or applying logical rules to rectify inconsistencies in predictions [66,67] (for example, a prediction indicating cancer should not concurrently suggest healthiness). Post-processing is about leveraging the existing machine learning capabilities as-is and employing domain knowledge to contextualize and correct the model's predictions [68]. ...

Machine learning with requirements: A manifesto

... The first dataset (n = 82) comprised participants between 60 and 79 years old who underwent inhospital PSG for suspected sleep apnea and shall be further referred to as the senior sleep dataset (Table 1). 9,10 The second dataset (n = 65), further referred to as the Alzheimer's sleep dataset, was obtained in a home-based environment and comprised patients with AD and control participants between 55 and 85 years old ( Table 2). 11 Technical details and extended dataset details are available in the Methods. ...

Automated remote sleep monitoring needs uncertainty quantification
  • Citing Article
  • August 2024

Journal of Sleep Research

... See also [15], [12], [14], [27], [28], [29], [30], [31], [32], [33], [34], and [35]. Emerging methodologies include the use of (deep) neural networks [36,37,38], representational learning [39], and large language models [40,41,42,43]. ...

Causal Deep Learning: Encouraging Impact on Real-world Problems Through Causality
  • Citing Article
  • January 2024

Foundations and Trends® in Signal Processing

... ( Oakden-Rayner et al., 2020;Suresh et al., 2018;Goel et al., 2020;Cabrera et al., 2019;van Breugel et al., 2024) Poor generalization, Fairness concerns Data shift Changes due to novel equipment, different measurement units, or clinical practice evolution over time. (Pianykh et al., 2020;Koh et al., 2021;Patel et al., 2008;Goetz et al., 2024) Poor generalization, Model bias, Need for continuous monitoring focus on domain-specific model classes and model interpretability. Different from typical data science tasks that mainly focus on classification and regression: domain-specific model classes account for temporal dependencies, hierarchical structures, and clinical context, ensuring that models are both accurate and practically applicable. ...

Generalization—a key challenge for responsible AI in patient-facing clinical applications

npj Digital Medicine

... Another literature stream focuses on partial identification under general causal graphs (Balazadeh et al., 2022), including IV settings with continuous variables such as continuous treatments (Gunsilius, 2020;Hu et al., 2021;Kilbertus et al., 2020;Padh et al., 2023). However, these methods either make strong assumptions about the treatment response functions or require unstable optimization via adversarial training and/or generative modeling such as through using GANs. ...

Stochastic Causal Programming for Bounding Treatment Effects

... Recent studies have highlighted the potential of machine learning (ML) methods to identify individual characteristics that influence treatment efficacy [25][26][27][28] . Due to their strong computational power, ML approaches may reveal specific patterns that might be missed by more traditional methods 29 . ...

Causal machine learning for predicting treatment outcomes
  • Citing Article
  • April 2024

Nature Medicine