Mauricio BarahonaImperial College London | Imperial · Department of Mathematics
Mauricio Barahona
PhD, MIT
Chair in Biomathematics, Dept of Mathematics, Imperial College London
About
352
Publications
68,346
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,778
Citations
Introduction
Graphs and dynamics. Community detection.
Graph-based and geometric machine learning.
Dimensionality reduction. Algorithms for nonlinear signal analysis.
Precision healthcare.
Theory of synchronization.
Mathematical and computational biology. Multiscale dynamics and model reduction of bio-systems.
Methods for the analysis of single-cell omics data.
Graph-based methods for structural biology and chemistry.
Stochastic processes and networks in biology.
Additional affiliations
January 1997 - June 1999
June 1996 - September 1996
July 1999 - March 2001
Education
September 1991 - May 1996
Publications
Publications (352)
We show that the classification performance of graph convolutional networks (GCNs) is related to the alignment between features, graph, and ground truth, which we quantify using a subspace alignment measure (SAM) corresponding to the Frobenius norm of the matrix of pairwise chordal distances between three subspaces associated with features, graph,...
Background
Real-time prediction is key to prevention and control of infections associated with health-care settings. Contacts enable spread of many infections, yet most risk prediction frameworks fail to account for their dynamics. We developed, tested, and internationally validated a real-time machine-learning framework, incorporating dynamic pati...
Measurements of systems taken along a continuous functional dimension, such as time or space, are ubiquitous in many fields, from the physical and biological sciences to economics and engineering. Such measurements can be viewed as realisations of an underlying smooth process sampled over the continuum. However, traditional methods for independence...
The high binding affinity of antibodies towards their cognate targets is key to eliciting effective immune responses, as well as to the use of antibodies as research and therapeutic tools. Here, we propose ANTIPASTI, a Convolutional Neural Network model that achieves state-of-the-art performance in the prediction of antibody binding affinity using...
Recently, random lasing in complex networks has shown efficient lasing over more than 50 localised modes, promoted by multiple scattering over the underlying graph. If controlled, these network lasers can lead to fast-switching multifunctional light sources with synthesised spectrum. Here, we observe both in experiment and theory high sensitivity o...
Raman spectroscopy is widely used across scientific domains to characterize the chemical composition of samples in a nondestructive, label-free manner. Many applications entail the unmixing of signals from mixtures of molecular species to identify the individual components present and their proportions, yet conventional methods for chemometrics oft...
Background
Relational continuity in primary and secondary care is linked to better health outcomes for patients, but it is unclear whether metrics of continuity in each setting are associated. Our study examined the association between relational continuity in general practice (GP) and continuity of hospital outpatient specialties in people with cl...
Background
People with multimorbidity are often seen in many different specialist health services, resulting in fragmented care. Conventional services are designed around specialties based on anatomical systems, rather than diseases that occur together. We examined whether organising services around clusters of co-occurring diseases would lead to f...
The high binding affinity of antibodies toward their cognate targets is key to eliciting effective immune responses, as well as to the use of antibodies as research and therapeutic tools. Here, we propose ANTIPASTI, a convolutional neural network model that achieves state-of-the-art performance in the prediction of antibody binding affinity using a...
Current anticancer therapies suffer from issues such as off‐target side effects and the emergence of drug resistance; therefore, the discovery of alternative therapeutic approaches is vital. These can include the development of drugs with different modes of action, and the exploration of new biomolecular targets. For the former, there has been incr...
Pluripotent progenitors undergo dramatic cellular and biochemical transformations during peri-implantation development. These large-scale reprogramming events are fundamental for subsequent differentiation, but how they are integrated and co-ordinated with the preservation of genome integrity remain unknown. Here, we uncover a metabolism-induced te...
Conventional lasers typically support a well‐defined comb of modes. Coupling many resonators together to form larger complex cavities enables the design of the spatial and spectral distribution of modes, for sensitive and controllable on‐chip light sources. Network lasers, formed from a mesh of dye‐doped polymer interconnecting waveguides, have sho...
Traditional models reliant solely on pairwise associations often prove insufficient in capturing the complex statistical structure inherent in multivariate data. Yet existing methods for identifying information shared among groups of $d>3$ variables are often intractable; asymmetric around a target variable; or unable to consider all factorisations...
We are pleased to announce that the presentations and posters of the Annual Computational Neuroscience Meeting (CNS*2023) have become available. Discover the detailed program on the official website
https://cns2023.sched.com ...
Join us at Annual Computational Neuroscience Meeting.
With the growing prevalence of AI, demand increases for efficient machine learning hardware. Physical systems are sought
which combine image feature detection with the essential nonlinearity for tasks such as image classification. Existing physical hardware typically detects features linearly, then employs digital processing for nonlinear activatio...
Background
Due to its late stage of diagnosis lung cancer is the commonest cause of death from cancer in the UK. Existing epidemiological risk models in clinical usage, which have Positive Predictive Values (PPV) of less than 10%, do not consider the temporal relations expressed in sequential electronic health record (EHR) data. Machine learning wi...
Understanding the behaviour of complex laser systems is an outstanding challenge, especially in the presence of nonlinear interactions between modes. Hidden features, such as the gain distributions and spatial localisation of lasing modes, often cannot be revealed experimentally, yet they are crucial to determining the laser action. Here, we introd...
Near-field coupling between nanolasers enables collective high-power lasing but leads to complex spectral reshaping and multimode operation, limiting the emission brightness, spatial coherence and temporal stability. Many lasing architectures have been proposed to circumvent this limitation, based on symmetries, topology, or interference. We show t...
Inferring parameters of models of biochemical kinetics from single-cell data remains challenging because of the uncertainty arising from the intractability of the likelihood function of stochastic reaction networks. Such uncertainty falls beyond current error quantification measures, which focus on the effects of finite sample size and identifiabil...
A job usually involves the application of several complementary or synergistic skills to perform its required tasks. Such relationships are implicitly recognised by employers in the skills they demand when recruiting new employees. Here we construct a skills network based on their co-occurrence in a national level data set of 65 million job posting...
Background
Identifying clusters of diseases may aid understanding of shared aetiology, management of co-morbidities, and the discovery of new disease associations. Our study aims to identify disease clusters using a large set of long-term conditions and comparing methods that use the co-occurrence of diseases versus methods that use the sequence of...
Raman spectroscopy is a nondestructive and label-free chemical analysis technique, which plays a key role in the analysis and discovery cycle of various branches of science. Nonetheless, progress in Raman spectroscopic analysis is still impeded by the lack of software, methodological and data standardization, and the ensuing fragmentation and lack...
Objective
Natural language processing (NLP) algorithms are increasingly being applied to obtain unsupervised representations of electronic health record (EHR) data, but their comparative performance at predicting clinical endpoints remains unclear. Our objective was to compare the performance of unsupervised representations of sequences of disease...
Background
Identifying clusters of co-occurring diseases may help characterise distinct phenotypes of Multiple Long-Term Conditions (MLTC). Understanding the associations of disease clusters with health-related outcomes requires a strategy to assign clusters to people, but it is unclear how the performance of strategies compare.
Aims
First, to com...
We present PyGenStability, a general-use Python software package that provides a suite of analysis and visualisation tools for unsupervised multiscale community detection in graphs. PyGenStability finds optimized partitions of a graph at different levels of resolution by maximizing the generalized Markov Stability quality function with the Louvain...
Raman spectroscopy is widely used across scientific domains to characterize the chemical composition of samples in a non-destructive, label-free manner. Many applications entail the unmixing of signals from mixtures of molecular species to identify the individual components present and their proportions, yet conventional methods for chemometrics of...
Objective
To determine the extent to which the choice of timeframe used to define a long term condition affects the prevalence of multimorbidity and whether this varies with sociodemographic factors.
Design
Retrospective study of disease code frequency in primary care electronic health records.
Data sources
Routinely collected, general practice,...
Background
Carbapenemase-producing Enterobacterales (CPE) are challenging in healthcare, with resistance to multiple classes of antibiotics. This study describes the emergence of IMP-encoding CPE amongst diverse Enterobacterales species between 2016 and 2019 across a London regional network.
Methods
We performed a network analysis of patient pathw...
The personal well-being of workers may be influenced by the risk of job automation brought about by technological innovation. Here we use data from the Understanding Society survey in the UK and a fixed-effects model to examine associations between working in a highly automatable job and life and job satisfaction. We find that employees in highly a...
Multivariate time-series data that capture the temporal evolution of interconnected systems are ubiquitous in diverse areas. Understanding the complex relationships and potential dependencies among co-observed variables is crucial for the accurate statistical modelling and analysis of such systems. Here, we introduce kernel-based statistical tests...
Natural language processing (NLP) is increasingly being applied to obtain unsupervised representations of electronic healthcare record (EHR) data, but their performance for the prediction of clinical endpoints remains unclear. Here we use primary care EHRs from 6,286,233 people with Multiple Long-Term Conditions in England to generate vector repres...
Measurements of systems taken along a continuous functional dimension, such as time or space, are ubiquitous in many fields, from the physical and biological sciences to economics and engineering. Such measurements can be viewed as realisations of an underlying smooth process sampled over the continuum. However, traditional methods for independence...
From the perspective of human mobility, the COVID-19 pandemic constituted a natural experiment of enormous reach in space and time. Here, we analyse the inherent multiple scales of human mobility using Facebook Movement maps collected before and during the first UK lockdown. Firstly, we obtain the pre-lockdown UK mobility graph and employ multiscal...
Objectives
To determine whether the frequency of diagnostic codes for long-term conditions (LTCs) in primary care electronic healthcare records (EHRs) is associated with (1) disease coding incentives, (2) General Practice (GP), (3) patient sociodemographic characteristics and (4) calendar year of diagnosis.
Design
Retrospective cohort study.
Sett...
The statistical structure of the environment is often important when making decisions. There are multiple theories of how the brain represents statistical structure. One such theory states that neural activity spontaneously samples from probability distributions. In other words, the network spends more time in states which encode high-probability s...
Understanding and adequately assessing the difference between a true and a learnt causal graphs is crucial for causal inference under interventions. As an extension to the graph-based structural Hamming distance and structural intervention distance, we propose a novel continuous-measured metric that considers the underlying data in addition to the...
Raman spectroscopy is a non-destructive and label-free chemical analysis technique, which plays a key role in the analysis and discovery cycle of various branches of science. Nonetheless, progress in Raman spectroscopic analysis is still impeded by the lack of software, methodological and data standardisation, and the ensuing fragmentation and lack...
Raman spectroscopy is a non-destructive and label-free chemical analysis technique, which plays a key role in the analysis and discovery cycle of various branches of science. Nonetheless, progress in Raman spectroscopic analysis is still impeded by the lack of software, methodological and data standardisation, and the ensuing fragmentation and lack...
Identifying clusters of co-occurring diseases can aid understanding of shared aetiology, management of co-morbidities, and the discovery of new disease associations. Here, we use data from a population of over ten million people with multimorbidity registered to primary care in England to identify disease clusters through a two-stage process. First...
Models that rely solely on pairwise relationships often fail to capture the complete statistical structure of the complex multivariate data found in diverse domains, such as socio-economic, ecological, or biomedical systems. Non-trivial dependencies between groups of more than two variables can play a significant role in the analysis and modelling...
Multivariate time-series data that capture the temporal evolution of interconnected systems are ubiquitous in diverse areas. Understanding the complex relationships and potential dependencies among co-observed variables is crucial for the accurate statistical modelling and analysis of such systems. Here, we introduce kernel-based statistical tests...
In many applications in data clustering, it is desirable to find not just a single partition but a sequence of partitions that describes the data at different scales, or levels of coarseness, leading naturally to Sankey diagrams as descriptors of the data. The problem of multiscale clustering then becomes how to to select robust intrinsic scales, a...
The dynamics of neuron populations during diverse behaviours evolve on low-dimensional manifolds. However, it remains challenging to disentangle the role of manifold geometry and dynamics in encoding task variables. Here, we introduce an unsupervised geometric deep learning framework for representing non-linear dynamical systems based on statistica...
The dynamics of neuron populations during diverse tasks often evolve on low-dimensional manifolds. However, it remains challenging to discern the contributions of geometry and dynamics for encoding relevant behavioural variables. Here, we introduce an unsupervised geometric deep learning framework for representing non-linear dynamical systems based...
We apply different feature engineering methods for time-series to US market price data. The predictive power of models are tested against Numerai-Signals targets.
In this paper, we explore the use of different feature engineering and dimensionality reduction methods in multi-variate time-series modelling. Using a feature-target cross correlation time series dataset created from Numerai tournament, we demonstrate under over-parameterised regime, both the performance and predictions from different feature engi...
We present PyGenStability, a general-use Python software package that provides a suite of analysis and visualisation tools for unsupervised multiscale community detection in graphs. PyGenStability finds optimized partitions of a graph at different levels of resolution by maximizing the generalized Markov Stability quality function with the Louvain...
A signal mixer made of a transistor facilitates rich computation that has been the building block of modern telecommunication. Here we report that a neural cell is also a signal mixer. We found through ex vivo and in vivo measurements that individual neurons mix exogenous (controlled) and endogenous (spontaneous) subthreshold membrane potential osc...
The application of deep learning algorithms to financial data is difficult due to heavy non-stationarities which can lead to over-fitted models that underperform under regime changes. Using the Numerai tournament data set as a motivating example, we propose a machine learning pipeline for trading market-neutral stock portfolios based on tabular dat...
The effectiveness of Bayesian Additive Regression Trees (BART) has been demonstrated in a variety of contexts including non-parametric regression and classification. A BART scheme for estimating the intensity of inhomogeneous Poisson processes is introduced. Poisson intensity estimation is a vital task in various applications including medical imag...
The dynamics of many systems from physics, economics, chemistry, and biology can be modelled through polynomial functions. In this paper, we provide a computational means to find positively invariant sets of polynomial dynamical systems by using semidefinite programming to solve sum-of-squares (SOS) programmes. With the emergence of SOS programmes,...
Allostery commonly refers to the mechanism that regulates protein activity through the binding of a molecule at a different, usually distal, site from the orthosteric site. The omnipresence of allosteric regulation in nature and its potential for drug design and screening render the study of allostery invaluable. Nevertheless, challenges remain as...
Early diagnosis of disease can result in improved health outcomes, such as higher survival rates and lower treatment costs. With the massive amount of information in electronic health records (EHRs), there is great potential to use machine learning (ML) methods to model disease progression aimed at early prediction of disease onset and other outcom...
Allostery commonly refers to the mechanism that regulates protein activity through the binding of a molecule at a different, usually distal, site from the orthosteric site. The omnipresence of allosteric regulation in nature and its potential for drug design and screening render the study of allostery invaluable. Nevertheless, challenges remain as...
Inhibiting the main protease of SARS-CoV-2 is of great interest in tackling the COVID-19 pandemic caused by the virus. Most efforts have been centred on inhibiting the binding site of the enzyme. However, considering allosteric sites, distant from the active or orthosteric site, broadens the search space for drug candidates and confers the advantag...
Directed acyclic graphs (DAGs) are a useful tool to represent, in a graphical format, researchers’ assumptions about the causal structure among variables while providing a rationale for the choice of confounding variables to adjust for. With origins in the field of probabilistic graphical modelling, DAGs are yet to be widely adopted in applied heal...
Dimension is a fundamental property of objects and the space in which they are embedded. Yet ideal notions of dimension, as in Euclidean spaces, do not always translate to physical spaces, which can be constrained by boundaries and distorted by inhomogeneities, or to intrinsically discrete systems such as networks. To take into account locality, fi...
Background: Global sustainability is an enmeshed system of complex socioeconomic, climatological, and ecological interactions. The numerous objectives of the UN's Sustainable Development Goals (SDGs) and the Paris Agreement have various levels of interdependence, making it difficult to ascertain the influence of changes to particular indicators acr...
This appendix formed part of the original submission and has been peer reviewed. We post it as supplied by the authors.
Supplement to: Laumann F, von Kügelgen J, Kanashiro Uehara TH, Barahona M. Complex interlinkages, key objectives, and nexuses among the Sustainable Development Goals and climate change: a network analysis. Lancet Planet Health 202...
Recently, random lasing in complex networks has shown efficient lasing over more than 50 localised modes, promoted by multiple scattering over the underlying graph. If controlled, these network lasers can lead to fast-switching multifunctional light sources with synthesised spectrum. Here, we observe both in experiment and theory high sensitivity o...
The identification of essential genes, i.e. those that impair cell survival when deleted, requires large growth assays of knock-out strains. The complexity and cost of such experiments has triggered a growing interest in computational methods for gene essentiality prediction. In the case of metabolic genes, Flux Balance Analysis (FBA) is widely emp...
Purpose
Contact tracing is a crucial tool in infection prevention and control (IPC), which aims to identify outbreaks and prevent onward transmission. What constitutes a contact is typically based on strict binary criteria (i.e., being at a location at the same time). Missing data, indirect contacts and background sources can however substantially...
Purpose
Predicting healthcare-acquired infections (HAIs) has the potential to revolutionise the prevention and control of transmissible infections. Existing prediction models for HAIs, however, fail to capture fully the contact-driven nature of infectious diseases. Here, we investigate the epidemiological predictivity of patient contact patterns th...
Strict lockdown measures have been put in place in many countries around the world to constrain human mobility in response to the unparalleled challenges posed by the COVID-19 pandemic. Here we apply network-theoretic tools to analyse a geolocalised dataset of human mobility of 16 million UK Facebook users from March to July 2020. A special emphasi...
The dynamics of many systems from physics, economics, chemistry, and biology can be modelled through polynomial functions. In this paper, we provide a computational means to find positively invariant sets of polynomial dynamical systems by using semidefinite programming to solve sum-of-squares (SOS) programmes. With the emergence of SOS programmes,...
The identification of essential genes, i.e. those that impair cell survival when deleted, requires large growth assays of knock-out strains. The complexity and cost of such experiments has triggered a growing interest in computational methods for prediction of gene essentiality. In the case of metabolic genes, Flux Balance Analysis (FBA) is widely...