About
77
Publications
11,219
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,263
Citations
Introduction
Additional affiliations
November 2015 - January 2017
October 2013 - October 2015
August 2012 - September 2013
Publications
Publications (77)
Objective
We develop and externally validate two models for use with radiological knee osteoarthritis. They consist of a diagnostic model for KOA and a prognostic model of time to onset of KOA. Model development and optimisation used data from the Osteoarthritis initiative (OAI) and external validation for both models was by application to data fro...
Sepsis is a heterogeneous syndrome characterized by a variety of clinical features. Analysis of large clinical datasets may serve to define groups of sepsis with different risks of adverse outcomes. Clinical experience supports the concept that prognosis, treatment, severity, and time course of sepsis vary depending on the source of infection. We a...
The occurrence of atrial fibrillation (AF) represents clinical deterioration in acutely unwell patients and leads to increased morbidity and mortality. Prediction of the development of AF allows early intervention. Using the AmsterdamUMCdb, clinically relevant variables from patients admitted in sinus rhythm were extracted over the full duration of...
The most limiting factor in heart transplantation is lack of donor organs. With enhanced prediction of outcome, it may be possible to increase the life-years from the organs that become available. Applications of machine learning to tabular data, typical of clinical decision support, pose the practical question of interpretation which has technical...
Almost all machine learning (ML) is based on representing examples using intrinsic features. When there are multiple related ML problems (tasks), it is possible to transform these features into extrinsic features by first training ML models on other tasks and letting them each make predictions for each example of the new task, yielding a novel repr...
There has been an exponential growth of artificial intelligence (AI) and machine learning (ML) publications aimed at advancing our understanding of atrial fibrillation (AF), which has been mainly driven by the confluence of two factors: the advances in deep neural networks (DeepNNs) and the availability of large, open access databases. It is observ...
Background:
Hepatosteatosis (HS) has been associated with cardiovascular disorders in the general population. We sought to investigate whether HS is a marker of CVD risk in HIV-positive individuals given that metabolic syndrome is implicated in the increasing CVD burden in this population.
Aims:
To investigate the association of hepatosteatosis...
Chitosan nanoparticles (CT NPs) have attractive biomedical applications due to their unique properties. This present research aimed at development of chitosan nanoparticles to be used as skin delivery systems for cosmetic components and drugs and to track their penetration behaviour through pig skin. CT NPs were prepared by ionic gelation technique...
The proper management of a municipal water system is essential to sustain cities and support the water security of societies. Urban water estimating has always been a challenging task for managers of water utilities and policymakers. This paper applies a novel methodology that includes data pre-processing and an Artificial Neural Network (ANN) opti...
Accurate and reliable urban water demand prediction is imperative for providing the basis to design, operate, and manage water system, especially under the scarcity of the natural water resources. A new methodology combining discrete wavelet transform (DWT) with an adaptive neuro-fuzzy inference system (ANFIS) is proposed to predict monthly urban w...
Neural networks are frequently applied to medical data. We describe how complex and imbalanced data can be modelled with simple but accurate neural networks that are transparent to the user. In the case of a data set on cervical cancer with 753 observations excluding, missing values, and 32 covariates, with a prevalence of 73 cases (9.69%), we expl...
Prostate cancer is the second most commonly occurring cancer in men. Diagnosis through Magnetic Resonance Imaging (MRI) is limited, yet current practice holds a relatively low specificity. This paper extends a previous SPIE ProstateX challenge study in three ways (1) to include healthy tissue analysis, creating a solution suitable for clinical prac...
The aim of this paper is to build a computer based clinical decision support tool using a semi-supervised framework, the Fisher Information Network (FIN), for visualization of a set of mammographic images. The FIN organizes the images into a similarity network from which, for any new image, reference images that are closely related can be identifie...
This paper discusses the concepts of interpretability and explainability and outlines desiderata for robust interpretability. It then describes a neural network model that meets all criteria, with the addition of global faithfulness.
Abstract The goal of quantitative structure activity relationship (QSAR) learning is to learn a function that, given the structure of a small molecule (a potential drug), outputs the predicted activity of the compound. We employed multi-task learning (MTL) to exploit commonalities in drug targets and assays. We used datasets containing curated reco...
Prostate cancer is the most common cancer in men in the UK. An accurate diagnosis at the earliest stage possible is critical in its treatment. Multi-parametric Magnetic Resonance Imaging is gaining popularity in prostate cancer diagnosis, it can be used to actively monitor low-risk patients, and it is convenient due to its non-invasive nature. Howe...
This case study benchmarks a range of statistical and machine learning methods relevant to computer-based decision support in clinical medicine, focusing on the diagnosis of knee osteoarthritis at first presentation. The methods, comprising logistic regression, Multilayer Perceptron (MLP), Chi-square Automatic Interaction Detector (CHAID) and Class...
We propose a method to open the black box of the Multi-Layer Perceptron by inferring from it a simpler and generally more accurate general additive model. The resulting model comprises non-linear univariate and bivariate partial responses derived from the original Multi-Layer Perceptron. The responses are combined using the Lasso and further optimi...
Glioblastoma is the most frequent malignant intra-cranial tumour. Magnetic resonance imaging is the modality of choice in diagnosis, aggressiveness assessment, and follow-up. However, there are examples where it lacks diagnostic accuracy. Magnetic resonance spectroscopy enables the identification of molecules present in the tissue, providing a prec...
Being able to predict the activity of chemical compounds against a drug target (e.g. a protein) in the preliminary stages of drug development is critical. In drug discovery, this is known as Quantitative Structure Activity Relationships (QSARs). Datasets for QSARs are often ill-posed for traditional machine learning to provide meaningful insights (...
The key to success in machine learning (ML) is the use of effective data representations. Traditionally, data representations were hand-crafted. Recently it has been demonstrated that, given sufficient data, deep neural networks can learn effective implicit representations from simple input representations. However, for most scientific problems, th...
Major bleeding is a common complication after percutaneous coronary intervention (PCI), although little is known about how bleeding rates have changed over time and what has driven this. We analyzed all patients who underwent PCI in England and Wales from 2006 to 2013. Multivariate analyses using logistic regression models were performed to identif...
Objectives
Prasugrel and ticagrelor both reduce ischaemic endpoints in high-risk acute coronary syndromes, compared with clopidogrel. However, comparative outcomes of these two newer drugs in the context of primary percutaneous coronary intervention (PCI) for ST-elevation myocardial infarction (STEMI) remains unclear. We sought to examine this ques...
We investigate the learning of quantitative structure activity relationships (QSARs) as a case-study of meta-learning. This application area is of the highest societal importance, as it is a key step in the development of new medicines. The standard QSAR learning problem is: given a target (usually a protein) and a set of chemical compounds (small...
Objectives
This study sought to examine the relationship between access site practice and clinical outcomes in patients requiring percutaneous coronary intervention (PCI) following thrombolysis for ST-segment elevation myocardial infarction (STEMI).
Background
Transradial access (TRA) is associated with better outcomes in patients requiring PCI fo...
Research with structured Electronic Health Records (EHRs) is expanding as data becomes more accessible; analytic methods advance; and the scientific validity of such studies is increasingly accepted. However, data science methodology to enable the rapid searching/extraction, cleaning and analysis of these large, often complex, datasets is less well...
Background:
Percutaneous coronary intervention (PCI) is the most common modality of revascularization in patients with coronary artery disease. Understanding the readmission rates and reasons for readmission after PCI is important because readmissions are a quality of care indicator, in addition to being a burden to patients and healthcare service...
Aims
We sought to investigate the prognostic impact of co-morbid burden as defined by the Charlson comorbidity index (CCI) in patients with a range of prevalent cardiovascular diseases.
Methods & Results
We searched MEDLINE and EMBASE to identify studies that evaluated the impact of CCI on mortality in patients with cardiovascular disease. A rando...
Characterization of glioblastoma (GB) response to treatment is a key factor for improving patients' survival and prognosis. MRI and magnetic resonance spectroscopic imaging (MRSI) provide morphologic and metabolic profiles of GB but usually fail to produce unequivocal biomarkers of response. The purpose of this work is to provide proof of concept o...
Background:
The use of Electronic Health Records databases for medical research has become mainstream. In the UK, increasing use of Primary Care Databases is largely driven by almost complete computerisation and uniform standards within the National Health Service. Electronic Health Records research often begins with the development of a list of c...
Objectives Little is known about service utilisation by patients with severe mental illness (SMI) in UK primary care. We examined their consultation rate patterns and whether they were impacted by the introduction of the Quality and Outcomes Framework (QOF), in 2004.
Design Retrospective cohort study using individual patient data collected from 200...
Objectives Little is known about the prevalence of comorbidity rates in people with severe mental illness (SMI) in UK primary care. We calculated the prevalence of SMI by UK country, English region and deprivation quintile, antipsychotic and antidepressant medication prescription rates for people with SMI, and prevalence rates of common comorbiditi...
Magnetic resonance spectroscopy provides metabolic information about living tissues in a non-invasive way. However, there are only few multi-centre clinical studies, mostly performed on a single scanner model or data format, as there is no flexible way of documenting and exchanging processed magnetic resonance spectroscopy data in digital format. T...
Introduction
Primary care databases (PCDs) are increasingly used as
a resource for research, but issues around the validity of
studies based on PCDs remain. We conducted the frst
fully independent replications of published PCD studies in a
different PCD covering the same target population.
Methods
We replicated two previous PCD drug effectivene...
Lists of clinical codes are the foundation for research undertaken using electronic medical records (EMRs). If clinical code lists are not available, reviewers are unable to determine the validity of research, full study replication is impossible, researchers are unable to make effective comparisons between studies, and the construction of new code...
To conduct a fully independent and external validation of a research study based on one electronic health record database, using a different electronic database sampling the same population.
Using the Clinical Practice Research Datalink (CPRD), we replicated a published investigation into the effects of statins in patients with ischaemic heart dise...
Time-dependent natural phenomena and artificial processes can often be quantitatively expressed as multivariate time series (MTS). As in any other process of knowledge extraction from data, the analyst can benefit from the exploration of the characteristics of MTS through data visualization. This visualization often becomes difficult to interpret w...
Background:
The clinical investigation of human brain tumors often starts with a non-invasive imaging study, providing information about the tumor extent and location, but little insight into the biochemistry of the analyzed tissue. Magnetic Resonance Spectroscopy can complement imaging by supplying a metabolic fingerprint of the tissue. This stud...
The world of pharmacology is becoming increasingly dependent on the advances in the fields of genomics and proteomics. The –omics sciences bring about the challenge of how to deal with the large amounts of complex data they generate from an intelligent data analysis perspective. In this chapter, the authors focus on the analysis of a specific type...
The world of pharmacology is becoming increasingly dependent on the advances in the fields of genomics and proteomics. The -omics sciences bring about the challenge of how to deal with the large amounts of complex data they generate from an intelligent data analysis perspective. In this chapter, the authors focus on the analysis of a specific type...
The world of pharmacology is becoming increasingly dependent on the advances in the fields of genomics and proteomics. This dependency brings about the challenge of finding robust methods to analyze the complex data they generate. In this brief paper, we focus on the analysis of a specific type of proteins, the G protein-couple receptors, which are...
Potassium homeostasis is crucial for living cells. In the yeast Saccharomyces cerevisiae, the uptake of potassium is driven by the electrochemical gradient generated by the Pma1 H+-ATPase, and this process represents a major consumer of the gradient. We considered that any mutation resulting in an alteration
of the electrochemical gradient could gi...
The study of G protein-coupled receptors (GPCRs) is of great interest in pharmaceutical research, but only a few of their 3D structures are known at present. On the contrary, their amino acid sequences are known and accessible. Sequence analysis can provide new insight on GPCR function. Here, we use a kernel-based statistical machine learning model...
Proton Magnetic Resonance (MR) Spectroscopy (MRS) is a widely available technique for those clinical centres equipped with MR scanners. Unlike the rest of MR-based techniques, MRS yields not images but spectra of metabolites in the tissues. In pathological situations, the MRS profile changes and this has been particularly described for brain tumour...
Help and Manual of SpectraClassifier 1.0. The "Help and Manual of SpectraClassifier 1.0" provides more detailed technical information about the software.
SpectraClassifier (SC) is a Java solution for designing and implementing Magnetic Resonance Spectroscopy (MRS)-based classifiers. The main goal of SC is to allow users with minimum background knowledge of multivariate statistics to perform a fully automated pattern recognition analysis. SC incorporates feature selection (greedy stepwise approach, e...
A variational Bayesian formulation for a manifold-constrained Hidden Markov Model is used in this paper to segment a set of multivariate time series of electromyographic recordings corresponding to stroke patients and control subjects. An index of variability associated to this model is defined and applied to the robust detection of the silent peri...
A kernel version of Generative Topographic Mapping, a model of the manifold learning family, is defined in this paper. Its ability to adequately model non-i.i.d. data is illustrated in a problem concerning the identification of protein subfamilies from protein sequences. 1
The exploratory investigation of multivariate time series (MTS) may become extremely difficult, if not impossible, for high dimensional datasets. Paradoxically, to date, little research has been conducted on the exploration of MTS through unsupervised clustering and visualization. In this chapter, the authors describe generative topographic mapping...
General finite mixture models are powerful tools for the density-based grouping of multivariate i.i.d. data, but they lack
data visualization capabilities, which reduces their practical applicability to real-world problems. Generative topographic
mapping (GTM) was originally formulated as a constrained mixture of distributions in order to provide s...
Justification
Automatic brain tumor classification by MRS has been under development for more than a decade. Nonetheless, to our knowledge, there are no published evaluations of predictive models with unseen cases that are subsequently acquired in different centers. The multicenter eTUMOUR project (2004–2009), which builds upon previous expertise f...
Most of the existing research on multivariate time series concerns supervised forecasting problems. In comparison, little research has been devoted to their exploration through unsupervised clustering and visualization. In this paper, the capabilities of Generative Topographic Mapping Through Time, a model with foundations in probability theory, th...