Mateja JamnikUniversity of Cambridge | Cam · Computer Laboratory
Mateja Jamnik
PhD in Artificial Intelligence
About
177
Publications
20,173
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
980
Citations
Introduction
Additional affiliations
March 2002 - April 2002
January 2003 - present
April 2002 - April 2012
Education
November 1995 - October 1998
October 1994 - August 1995
September 1991 - August 1994
Publications
Publications (177)
Integrating AI in healthcare can greatly improve patient care and system efficiency. However, the lack of explainability in AI systems (XAI) hinders their clinical adoption, especially in multimodal settings that use increasingly complex model architectures. Most existing XAI methods focus on unimodal models, which fail to capture cross-modal inter...
Computational analysis of whole slide images (WSIs) has seen significant research progress in recent years, with applications ranging across important diagnostic and prognostic tasks such as survival or cancer subtype prediction. Many state-of-the-art models process the entire slide - which may be as large as $150,000 \times 150,000$ pixels - as a...
Ontologies are useful for automatic machine processing of domain knowledge as they represent it in a structured format. Yet, constructing ontologies requires substantial manual effort. To automate part of this process, large language models (LLMs) have been applied to solve various subtasks of ontology learning. However, this partial ontology learn...
Deep neural networks trained via empirical risk minimisation often exhibit significant performance disparities across groups, particularly when group and task labels are spuriously correlated (e.g., "grassy background" and "cows"). Existing bias mitigation methods that aim to address this issue often either rely on group labels for training or vali...
Data collection is often difficult in critical fields such as medicine, physics, and chemistry. As a result, classification methods usually perform poorly with these small datasets, leading to weak predictive performance. Increasing the training set with additional synthetic data, similar to data augmentation in images, is commonly believed to impr...
Text embeddings are essential for many tasks, such as document retrieval, clustering, and semantic similarity assessment. In this paper, we study how to contrastively train text embedding models in a compute-optimal fashion, given a suite of pre-trained decoder-only language models. Our innovation is an algorithm that produces optimal configuration...
There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs; this is insufficient for making an informed decision about which LLMs are best to use in an interactive setti...
Tabular data is prevalent in many critical domains, yet it is often challenging to acquire in large quantities. This scarcity usually results in poor performance of machine learning models on such data. Data augmentation, a common strategy for performance improvement in vision and language tasks, typically underperforms for tabular data due to the...
Learning holistic computational representations in physical, chemical or biological systems requires the ability to process information from different distributions and modalities within the same model. Thus, the demand for multimodal machine learning models has sharply risen for modalities that go beyond vision and language, such as sequences, gra...
Concept-based explainability methods provide insight into deep learning systems by constructing explanations using human-understandable concepts. While the literature on human reasoning demonstrates that we exploit relationships between concepts when solving tasks, it is unclear whether concept-based methods incorporate the rich structure of inter-...
In this paper we introduce MaRE, a tool designed to generate representations in multiple modalities for a given mathematical problem while ensuring the correctness and interpretability of the transformations between different representations. The theoretical foundation for this tool is Representational Systems Theory (RST), a mathematical framework...
The opaque reasoning of Graph Neural Networks induces a lack of human trust. Existing graph network explainers attempt to address this issue by providing post-hoc explanations, however, they fail to make the model itself more interpretable. To fill this gap, we introduce the Concept Distillation Module, the first differentiable concept-distillation...
Rule-based surrogate models are an effective and interpretable way to approximate a Deep Neural Network’s (DNN) decision boundaries, allowing humans to easily understand deep learning models. Current state-of-the-art decompositional methods, which are those that consider the DNN’s latent space to extract more exact rule sets, manage to derive rule...
Concept-based interpretability addresses the opacity of deep neural networks by constructing an explanation for a model's prediction using high-level units of information referred to as concepts. Research in this area, however, has been mainly focused on image and graph-structured data, leaving high-stakes tasks whose data is tabular out of reach o...
Variational Autoencoders and their many variants have displayed impressive ability to perform dimensionality reduction, often achieving state-of-the-art performance. Many current methods however, struggle to learn good representations in High Dimensional, Low Sample Size (HDLSS) tasks, which is an inherently challenging setting. We address this cha...
Tabular biomedical data is often high-dimensional but with a very small number of samples. Although recent work showed that well-regularised simple neural networks could outperform more sophisticated architectures on tabular data, they are still prone to overfitting on tiny datasets with many potentially irrelevant features. To combat these issues,...
Recent work on interpretability has focused on concept-based explanations, where deep learning models are explained in terms of high-level units of information, referred to as concepts. Concept learning models, however, have been shown to be prone to encoding impurities in their representations, failing to fully capture meaningful features of their...
Tabular biomedical data poses challenges in machine learning because it is often high-dimensional and typically low-sample-size. Previous research has attempted to address these challenges via feature selection approaches, which can lead to unstable performance on real-world data. This suggests that current methods lack appropriate inductive biases...
The standard methodology of evaluating large language models (LLMs) based on static pairs of inputs and outputs is insufficient for developing assistants: this kind of assessments fails to take into account the essential interactive element in their deployment, and therefore limits how we understand language model capabilities. We introduce CheckMa...
Spatial transcriptomic technologies profile gene expression in-situ, facilitating the spatial characterisation of molecular phenomena within tissues, yet often at multi-cellular resolution. Computational approaches have been developed to infer fine-grained cell-type compositions across locations, but they frequently treat neighbouring spots indepen...
Deep learning methods are highly accurate, yet their opaque decision process prevents them from earning full human trust. Concept-based models aim to address this issue by learning tasks based on a set of human-understandable concepts. However, state-of-the-art concept-based models rely on high-dimensional concept embedding representations which la...
Competence Assessment by Chunk Hierarchy Evaluation with Transcription-tasks (CACHET) was proposed by Cheng [14]. It analyses micro-behaviors captured during cycles of stimulus viewing and copying in order to probe chunk structures in memory. This study extends CACHET by applying it to the domain of graphs and charts. Since drawing strategies are d...
Rule-based surrogate models are an effective and interpretable way to approximate a Deep Neural Network's (DNN) decision boundaries, allowing humans to easily understand deep learning models. Current state-of-the-art decompositional methods, which are those that consider the DNN's latent space to extract more exact rule sets, manage to derive rule...
Placing a human in the loop may abate the risks of deploying AI systems in safety-critical settings (e.g., a clinician working with a medical AI system). However, mitigating risks arising from human error and uncertainty within such human-AI interactions is an important and understudied issue. In this work, we study human uncertainty in the context...
Explainable AI (XAI) underwent a recent surge in research on concept extraction, focusing on extracting human-interpretable concepts from Deep Neural Networks. An important challenge facing concept extraction approaches is the difficulty of interpreting and evaluating discovered concepts, especially for complex tasks such as molecular property pred...
Recent work on interpretability has focused on concept-based explanations, where deep learning models are explained in terms of high-level units of information, referred to as concepts. Concept learning models, however, have been shown to be prone to encoding impurities in their representations, failing to fully capture meaningful features of their...
Tabular biomedical data is often high-dimensional but with a very small number of samples. Although recent work showed that well-regularised simple neural networks could outperform more sophisticated architectures on tabular data, they are still prone to overfitting on tiny datasets with many potentially irrelevant features. To combat these issues,...
By one of the most fundamental principles in physics, a dynamical system will exhibit those motions which extremise an action functional. This leads to the formation of the Euler-Lagrange equations, which serve as a model of how the system will behave in time. If the dynamics exhibit additional symmetries, then the motion fulfils additional conserv...
Recent work has suggested post-hoc explainers might be ineffective for detecting spurious correlations in Deep Neural Networks (DNNs). However, we show there are serious weaknesses with the existing evaluation frameworks for this setting. Previously proposed metrics are extremely difficult to interpret and are not directly comparable between explai...
Genome-wide studies leveraging recent high-throughput sequencing technologies collect high-dimensional data. However, they usually include small cohorts of patients, and the resulting tabular datasets suffer from the "curse of dimensionality". Training neural networks on such datasets is typically unstable, and the models overfit. One problem is th...
The formalization of existing mathematical proofs is a notoriously difficult process. Despite decades of research on automation and proof assistants, writing formal proofs remains arduous and only accessible to a few experts. While previous studies to automate formalization focused on powerful search algorithms, no attempts were made to take advant...
Deploying AI-powered systems requires trustworthy models supporting effective human interactions, going beyond raw prediction accuracy. Concept bottleneck models promote trustworthiness by conditioning classification tasks on an intermediate level of human-like concepts. This enables human interventions which can correct mispredicted concepts to im...
In this paper we study the practicality and usefulness of incorporating distributed representations of graphs into models within the context of drug pair scoring. We argue that the real world growth and update cycles of drug pair scoring datasets subvert the limitations of transductive learning associated with distributed representations. Furthermo...
A cognitive theory of the interpretive structure of visual representations (RIST) was proposed by Cheng (2020), which identified four classes of schemas that specify how domain concepts are encoded by graphical objects. A notation (RISN) for building RIST models as networks of these schemas was also introduced. This paper introduces common RIST/RIS...
This paper is the first to establish the impact of colour on users’ ability to interpret the informational content of concept diagrams, a logic designed for ontology engineering. Motivation comes from results for Euler diagrams, which form a fragment of concept diagrams: manipulating curve colours affects user performance. In particular, using dist...
The opaque reasoning of Graph Neural Networks induces a lack of human trust. Existing graph network explainers attempt to address this issue by providing post-hoc explanations, however, they fail to make the model itself more interpretable. To fill this gap, we introduce the Concept Encoder Module, the first differentiable concept-discovery approac...
Machine learning systems based on minimizing average error have been shown to perform inconsistently across notable subsets of the data, which is not exposed by a low average error for the entire dataset. In consequential social and economic applications, where data represent people, this can lead to discrimination of underrepresented gender and et...
Autoformalization is the process of automatically translating from natural language mathematics to formal specifications and proofs. A successful autoformalization system could advance the fields of formal verification, program synthesis, and artificial intelligence. While the long-term goal of autoformalization seemed elusive for a long time, we s...
In theorem proving, the task of selecting useful premises from a large library to unlock the proof of a given conjecture is crucially important. This presents a challenge for all theorem provers, especially the ones based on language models, due to their relative inability to reason over huge volumes of premises in text form. This paper introduces...
Motivation
Gene expression data is commonly used at the intersection of cancer research and machine learning for better understanding of the molecular status of tumour tissue. Deep learning predictive models have been employed for gene expression data due to their ability to scale and remove the need for manual feature engineering. However, gene ex...
In recent years, there has been significant work on increasing both interpretability and debuggability of a Deep Neural Network (DNN) by extracting a rule-based model that approximates its decision boundary. Nevertheless, current DNN rule extraction methods that consider a DNN's latent space when extracting rules, known as decompositional algorithm...
Choosing how to represent knowledge effectively is a long-standing open problem. Cognitive science has shed light on the taxonomisation of representational systems from the perspective of cognitive processes, but a similar analysis is absent from the perspective of problem solving, where the representations are employed. In this paper we review how...
We present a framework for assessing the relative cognitive cost of different representational systems for problem solving. The framework consists of 13 cognitive properties. These properties are mapped according to two dimensions: (1) the time scale of the cognitive process, and (2) the granularity of the representational system. The work includes...
New methods for the assessment of drawing strategies are examined that focus on the analysis of perceptual chunking. The methods are demonstrated with four diverse participants as they copied a line-graph and a bar-chart. Video recordings of the transcriptions were analysed stroke by stroke. Diverse global drawing strategies were used for the line...
In order to communicate, humans flatten a complex representation of ideas and their attributes into a single word or a sentence. We investigate the impact of representation learning in artificial agents by developing graph referential games. We empirically show that agents parametrized by graph neural networks develop a more compositional language...
To achieve efficient human computer collaboration, computers need to be able to represent information in ways that humans can understand. Picking a good representation is critical for effective communication and human learning, especially on technical topics. To select representations appropriately, AI systems must have some understanding of how hu...
In this chapter, I give a personal account of my experience in Alan Bundy’s DReaM group in the Department of Artificial Intelligence at the University of Edinburgh between the years of 1995 and 1998. Of course, the impact of this experience has been profound and long-lasting to this day. The culture and the nature of research work, the collaboratio...
Concept bottleneck models map from raw inputs to concepts, and then from concepts to targets. Such models aim to incorporate pre-specified, high-level concepts into the learning procedure, and have been motivated to meet three desiderata: interpretability, predictability, and intervenability. However, we find that concept bottleneck models struggle...
Despite their remarkable performance on a wide range of visual tasks, machine learning technologies often succumb to data distribution shifts. Consequently, a range of recent work explores techniques for detecting these shifts. Unfortunately, current techniques offer no explanations about what triggers the detection of shifts, thus limiting their u...
Concept-based explanations have emerged as a popular way of extracting human-interpretable representations from deep discriminative models. At the same time, the disentanglement learning literature has focused on extracting similar representations in an unsupervised or weakly-supervised way, using deep generative models. Despite the overlapping goa...
Deep learning models are receiving increasing attention in clinical decision-making, however the lack of interpretability and explainability impedes their deployment in day-to-day clinical practice. We propose REM, an interpretable and explainable methodology for extracting rules from deep neural networks and combining them with other data-driven a...
Deep learning models are receiving increasing attention in clinical decision-making, however the lack of interpretability and explainability impedes their deployment in day-to-day clinical practice. We propose REM, an interpretable and explainable methodology for extracting rules from deep neural networks and combining them with other data-driven a...
Recurrent Neural Networks (RNNs) have achieved remarkable performance on a range of tasks. A key step to further empowering RNN-based approaches is improving their explainability and interpretability. In this work we present MEME: a model extraction approach capable of approximating RNNs with interpretable models represented by human-understandable...
We investigate the influence of adversarial training on the interpretability of convolutional neural networks (CNNs), specifically applied to diagnosing skin cancer. We show that gradient-based saliency maps of adversarially trained CNNs are significantly sharper and more visually coherent than those of standardly trained CNNs. Furthermore, we show...
Stratifying cancer patients based on their gene expression levels allows improving diagnosis, survival analysis and treatment planning. However, such data is extremely highly dimensional as it contains expression values for over 20000 genes per patient, and the number of samples in the datasets is low. To deal with such settings, we propose to inco...
reasoning is a key indicator of intelligence. The ability to hypothesise, develop abstract concepts based on concrete observations and apply this hypothesis to justify future actions has been paramount in human development. An existing line of research in outfitting intelligent machines with abstract reasoning capabilities revolves around the Raven...
Deep Neural Networks (DNNs) have achieved remarkable performance on a range of tasks. A key step to further empowering DNN-based approaches is improving their explainability. In this work we present CME: a concept-based model extraction framework, used for analysing DNN models via concept-based extracted models. Using two case studies (dSprites, an...
We propose a method for gene expression based analysis of cancer phenotypes incorporating network biology knowledge through unsupervised construction of computational graphs. The structural construction of the computational graphs is driven by the use of topological clustering algorithms on protein-protein networks which incorporate inductive biase...
Deep Graph Neural Networks (GNNs) show promising performance on a range of graph tasks, yet at present are costly to run and lack many of the optimisations applied to DNNs. We show, for the first time, how to systematically quantise GNNs with minimal or no loss in performance using Network Architecture Search (NAS). We define the possible quantisat...
Choosing effective representations for a problem and for the person solving it has benefits, including the ability or inability to solve it. We previously devised a novel framework consisting of a language to describe representations and computational methods to analyse them in terms of their formal and cognitive properties. In this paper we demons...
reasoning, particularly in the visual domain, is a complex human ability, but it remains a challenging problem for artificial neural learning systems. In this work we propose MXGNet, a multilayer graph neural network for multi-panel diagrammatic reasoning tasks. MXGNet combines three powerful concepts, namely, object-level representation, graph neu...
While modern deep neural architectures generalise well when test data is sampled from the same distribution as training data, they fail badly for cases when the test data distribution differs from the training distribution even along a few dimensions. This lack of out-of-distribution generalisation is increasingly manifested when the tasks become m...
Modern theorem provers utilise a wide array of heuristics to control the search space explosion, thereby requiring optimisation of a large set of parameters. An exhaustive search in this multi-dimensional parameter space is intractable in most cases, yet the performance of the provers is highly dependent on the parameter assignment. In this work, w...