Harry Hemingway

Harry Hemingway
University College London | UCL · Institute of Health Informatics

Bachelor of Medicine

About

582
Publications
146,466
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
39,931
Citations
Introduction
H index 70, with 46,028 citations, 200+ publications and 20 years leadership in using electronic health record data across early and late phases of translational research. My research cited in 7 major clinical guidelines and public health policies with implications for the health of millions. Current Director roles span >100 staff, 75 post-graduate students, a budget of >£30m with strategic national and international partnerships and wide impacts.
Additional affiliations
August 2014 - present
University College London
Position
  • Managing Director
March 2013 - present
Farr Institute of Health Informatics Research, London
Position
  • Managing Director
March 2017 - present
NIHR Biomedical Research Centre
Position
  • Managing Director
Education
October 2004 - October 2005
Royal College of Physicians
Field of study
  • Medicine
October 2004 - October 2005
Faculty of public health
Field of study
  • Public Health and Epidemiology
October 1985 - October 1988
University of Cambridge
Field of study
  • Medicine

Publications

Publications (582)
Article
Full-text available
Background Despite the growing interest in the use of human genomic data for drug target identification and validation, the extent to which the spectrum of human disease has been addressed by genome-wide association studies (GWAS), or by drug development, and the degree to which these efforts overlap remain unclear. Methods In this study we harmon...
Preprint
Full-text available
Background: Accurate and reproducible phenotyping algorithms are essential to enable research at scale. Creating disease definitions in large biobanks is challenging as it involves combining data from multiple sources and modalities, and electronic health records using different medical ontologies. Our framework includes computational methods to 1)...
Article
Full-text available
Objective The HDRUK Phenotype Library (phenotypes.healthdatagateway.org) shares definitions used to measure concepts of interest (such as diagnoses or treatments) within health datasets. It already holds more than 1000 phenotypes, with researchers able to contribute their work via an API. We aimed to create a more user-friendly method of contributi...
Article
Full-text available
Background Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by...
Article
Full-text available
For many diseases there are delays in diagnosis due to a lack of objective biomarkers for disease onset. Here, in 41,931 individuals from the United Kingdom Biobank Pharma Proteomics Project, we integrated measurements of ~3,000 plasma proteins with clinical information to derive sparse prediction models for the 10-year incidence of 218 common and...
Article
Full-text available
Background Early evidence that patients with (multiple) pre-existing diseases are at highest risk for severe COVID-19 has been instrumental in the pandemic to allocate critical care resources and later vaccination schemes. However, systematic studies exploring the breadth of medical diagnoses are scarce but may help to understand severe COVID-19 am...
Article
Full-text available
Background Our study examined whether prevalent and incident comorbidities are increased in idiopathic pulmonary fibrosis (IPF) patients when compared to matched chronic obstructive pulmonary disease (COPD) patients and control subjects without IPF or COPD. Methods IPF and age, gender and smoking matched COPD patients, diagnosed between 01/01/1997...
Article
Background Electronic health records (EHRs) have the potential to be used to produce detailed disease burden estimates. In this study we created disease estimates using national EHR for three high burden conditions, compared estimates between linked and unlinked datasets and produced stratified estimates by age, sex, ethnicity, socio-economic depri...
Article
Objective To enable reproducible research at scale by creating a platform that enables health data users to find, access, curate, and re-use electronic health record phenotyping algorithms. Materials and Methods We undertook a structured approach to identifying requirements for a phenotype algorithm platform by engaging with key stakeholders. User...
Article
Full-text available
The COVID-19 pandemic exposed a global deficiency of systematic, data-driven guidance to identify high-risk individuals. Here, we illustrate the utility of routinely recorded medical history to predict the risk for 1883 diseases across clinical specialties and support the rapid response to emerging health threats such as COVID-19. We developed a ne...
Article
Full-text available
Background An electronic health record (EHR) holds detailed longitudinal information about a patient's health status and general clinical history, a large portion of which is stored as unstructured, free text. Existing approaches to model a patient's trajectory focus mostly on structured data and a subset of single-domain outcomes. This study aims...
Preprint
Full-text available
To better understand sex differences in human health and disease, we conducted a systematic, large-scale investigation of sex differences in the genetic regulation of the plasma proteome (>5,000 targets), including their disease relevance. Plasma levels of two-thirds of protein targets differed significantly by sex. In contrast, genetic effects on...
Article
Full-text available
Background The occurrence of a range of health outcomes following myocardial infarction (MI) is unknown. Therefore, this study aimed to determine the long-term risk of major health outcomes following MI and generate sociodemographic stratified risk charts in order to inform care recommendations in the post-MI period and underpin shared decision mak...
Article
Full-text available
Background/Objectives When studying the effect of weight change between two time points on a health outcome using observational data, two main problems arise initially (i) ‘when is time zero?’ and (ii) ‘which confounders should we account for?’ From the baseline date or the 1st follow-up (when the weight change can be measured)? Different methods h...
Article
Full-text available
Objective To clarify the performance of polygenic risk scores in population screening, individual risk prediction, and population risk stratification. Design Secondary analysis of data in the Polygenic Score Catalog. Setting Polygenic Score Catalog, April 2022. Secondary analysis of 3915 performance metric estimates for 926 polygenic risk scores...
Preprint
Full-text available
Background: The Global Burden of Disease study has provided key evidence to inform clinicians, researchers, and policy makers across common diseases, but no similar effort with single study design exists for hundreds of rare diseases. Consequently, many rare conditions lack population-level evidence including prevalence and clinical vulnerability....
Article
Full-text available
Raynaud’s phenomenon (RP) is a common vasospastic disorder that causes severe pain and ulcers, but despite its high reported heritability, no causal genes have been robustly identified. We conducted a genome-wide association study including 5,147 RP cases and 439,294 controls, based on diagnoses from electronic health records, and identified three...
Preprint
Full-text available
paragraph Heart failure (HF), a syndrome of symptomatic fluid overload due to cardiac dysfunction, is the most rapidly growing cardiovascular disorder. Despite recent advances, mortality and morbidity remain high and treatment innovation is challenged by limited understanding of aetiology in relation to disease subtypes. Here we harness the de-conf...
Preprint
Full-text available
Dilated cardiomyopathy (DCM) is a primary heart muscle disorder that has been associated with both monogenic and polygenic architectures, with the majority of cases having no molecular genetic diagnosis. To improve our understanding of the genetic basis of DCM, we perform a genome-wide association study (GWAS) meta-analysis comprising 14,255 DCM ca...
Preprint
Full-text available
Background: For many diseases there are delays in diagnosis due to a lack of objective biomarkers for disease onset. Whether measuring thousands of proteins offers predictive information across a wide range of diseases is unknown. Methods: In 41,931 individuals from the UK Biobank Pharma Proteomics Project (UKB-PPP), we integrated ~3000 plasma pr...
Article
Background: Electronic health records (EHRs) have the potential to be used to produce detailed disease burden estimates. In this study we created disease estimates using national EHR for three high burden conditions, compared estimates between linked and unlinked datasets and produced stratified estimates by age, sex, ethnicity, socio-economic depr...
Article
Background: Incident events of cardiovascular diseases (CVD) are heterogenous and may results in different mortality risks. Such evidence may help inform patient and physician decisions in CVD prevention and risk factor management. Aim: To determine the extent to which incident events of common CVD show heterogeneous associations with subsequent...
Article
Background: Machine learning has been used to analyse heart failure subtypes, but not across large, distinct, population-based datasets, across the whole spectrum of causes and presentations, or with clinical and non-clinical validation by different machine learning methods. Using our published framework, we aimed to discover heart failure subtype...
Preprint
Full-text available
Early evidence that patients with (multiple) pre-existing diseases are at highest risk for severe COVID-19 has been instrumental in the pandemic to allocate critical care resources and later vaccination schemes. However, systematic studies exploring the breadth of medical diagnoses, including common, but non-fatal diseases are scarce, but may help...
Article
Full-text available
Funding Acknowledgements Type of funding sources: None. Background Cardiovascular diseases (CVD) present as diverse phenotypes, which have considerably different effects on subsequent prognosis. However, studies quantifying these differences in prognosis after various first presentations of CVD are so far lacking. Purpose Our main study objective...
Preprint
Full-text available
The COVID-19 pandemic exposed, with few exceptions, a global deficiency in delivering systematic, data-driven guidance to protect citizens and coordinate vaccination programs. At the same time, medical histories are routinely recorded in most healthcare systems and are instantly available for risk assessment. Here, we demonstrate the utility of med...
Article
Full-text available
Background: Most adults presenting in primary care with chest pain symptoms will not receive a diagnosis ("unattributed" chest pain) but are at increased risk of cardiovascular events. Aim: To assess within patients with unattributed chest pain, risk factors for cardiovascular events and whether those at greatest risk of cardiovascular disease c...
Article
Full-text available
Background: Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction models largely ignore the complexity of CKD, its...
Article
Full-text available
Background Several common conditions have been widely recognised as risk factors for COVID-19 related death, but risks borne by people with rare diseases are largely unknown. Therefore, we aim to estimate the difference of risk for people with rare diseases comparing to the unaffected. Method To estimate the correlation between rare diseases and C...
Article
Full-text available
BACKGROUND: Globally, there is a paucity of multimorbidity and comorbidity data, especially for minority ethnic groups and younger people. We estimated the frequency of common disease combinations and identified non-random disease associations for all ages in a multiethnic population. METHODS: In this population-based study, we examined multimorbid...
Article
Full-text available
Objectives To use national, pre- and post-pandemic electronic health records (EHR) to develop and validate a scenario-based model incorporating baseline mortality risk, infection rate (IR) and relative risk (RR) of death for prediction of excess deaths. Design An EHR-based, retrospective cohort study. Setting Linked EHR in Clinical Practice Resea...
Article
Full-text available
Garrod’s concept of ‘chemical individuality’ has contributed to comprehension of the molecular origins of human diseases. Untargeted high-throughput metabolomic technologies provide an in-depth snapshot of human metabolism at scale. We studied the genetic architecture of the human plasma metabolome using 913 metabolites assayed in 19,994 individual...
Article
Background Globally, there is a paucity of multimorbidity and comorbidity data, especially for minority ethnic groups and younger people. We estimated the frequency of common disease combinations and identified non-random disease associations for all ages in a multiethnic population. Methods In this population-based study, we examined multimorbidi...
Preprint
Full-text available
Raynaud's phenomenon (RP) is a common vasospastic disorder that causes severe pain and ulcers, but despite its high reported heritability, no causal genes have been robustly identified. We conducted a genome-wide association study including 5,147 RP cases and 439,294 controls, based on diagnoses from electronic health records, and identified three...
Article
Full-text available
Big data is important to new developments in global clinical science that aim to improve the lives of patients. Technological advances have led to the regular use of structured electronic health-care records with the potential to address key deficits in clinical evidence that could improve patient care. The COVID-19 pandemic has shown this potentia...
Article
Full-text available
Big data is important to new developments in global clinical science that aim to improve the lives of patients. Technological advances have led to the regular use of structured electronic health-care records with the potential to address key deficits in clinical evidence that could improve patient care. The COVID-19 pandemic has shown this potentia...
Article
Full-text available
Individuals with South Asian ancestry have a higher risk of heart disease than other groups but have been largely excluded from genetic research. Using data from 22,000 British Pakistani and Bangladeshi individuals with linked electronic health records from the Genes & Health cohort, we conducted genome-wide association studies of coronary artery d...
Article
Full-text available
Chronic kidney disease (CKD) is associated with increased risk of baseline mortality and severe COVID-19, but analyses across CKD stages, and comorbidities are lacking. In prevalent and incident CKD, we investigated comorbidities, baseline risk, COVID-19 incidence, and predicted versus observed one-year excess death. In a national dataset (NHS Digi...
Article
Full-text available
Background Assessing the spectrum of disease risk associated with hypertriglyceridemia is needed to inform potential benefits from emerging triglyceride lowering treatments. We sought to examine the associations between a full range of plasma triglyceride concentration with five clinical outcomes. Methods We used linked data from primary and secon...
Article
Full-text available
BACKGROUND: Updatable estimates of COVID-19 onset, progression, and trajectories underpin pandemic mitigation efforts. To identify and characterise disease trajectories, we aimed to define and validate ten COVID-19 phenotypes from nationwide linked electronic health records (EHR) using an extensible framework. METHODS: In this cohort study, we used...
Article
Full-text available
Background Updatable estimates of COVID-19 onset, progression, and trajectories underpin pandemic mitigation efforts. To identify and characterise disease trajectories, we aimed to define and validate ten COVID-19 phenotypes from nationwide linked electronic health records (EHR) using an extensible framework. Methods In this cohort study, we used e...
Conference Paper
Introduction Acute stroke accounts for significant morbidity and mortality globally. The role of troponin for risk stratification in stroke is unclear. The aims of this study were to assess the relationship between peak troponin and mortality in patients with ischemic stroke, haemorrhagic stroke, or subarachnoid haemorrhage and to compare this with...
Conference Paper
Background Cardiac troponin is commonly raised in patients with malignancy and may aid clinicians in risk prediction. The prognostic significance of raised troponin in these patients with known malignancies remains unclear. We sought to investigate the relation between troponin and mortality in a large, well characterised cohort of patients undergo...
Conference Paper
Background A positive cardiac troponin (cTn) is an independent predictor of short-term mortality in individuals presenting with acute pulmonary embolism (PE). However, there is limited evidence regarding the impact age has on the association between cTn levels and mortality in patients with PE. The aim of our study was to investigate the relationsh...
Conference Paper
Introduction It has been challenging for researchers to access granular electronic health record (EHR) data at scale. The NIHR Health Informatics Collaborative (HIC) enables the sharing of routine EHR data across NHS hospitals for research. One emerging prospect is to use big data to traverse the translational spectrum. As an early discovery phase...
Article
Full-text available
Importance A lack of internationally agreed standards for combining available data sources at scale risks inconsistent disease phenotyping limiting research reproducibility. Objective To develop and then evaluate if a rules-based algorithm can identify coronary artery disease (CAD) sub-phenotypes using electronic health records (EHR) and questionn...
Article
Implications of elevated troponin on time-to-surgery in non-ST elevation myocardial infarction(NIHR Health Informatics Collaborative:TROP-CABG study). Benedetto et al. Background The optimal timing of coronary artery bypass grafting (CABG) in patients with non-ST elevation myocardial infarction (NSTEMI) and the utility of pre-operative troponin le...
Article
Full-text available
Background Variation in care is often poorly understood but has a big impact on patients. Non-ST segment elevation myocardial infarction (NSTEMI, also known as non-ST elevation acute coronary syndrome or NSTE-ACS) is the most common form of heart attack. NSTEMI is frequently hard to diagnose, its management pathway poorly defined and there is consi...
Article
Full-text available
AIM The optimal strategy for diabetes control in patients with heart failure (HF) following myocardial infarction (MI) remains unknown. Metformin, a guideline-recommended therapy for patients with chronic HF and type 2 diabetes mellitus (T2DM), is associated with reduced mortality and HF hospitalizations. However, worse outcomes have been reported...
Article
Full-text available
Background A minority of acute coronary syndrome (ACS) cases are associated with ventricular arrhythmias (VA) and/or cardiac arrest (CA). We investigated the effect of VA/CA at the time of ACS on long‐term outcomes. Methods and Results We analyzed routine clinical data from 5 National Health Service trusts in the United Kingdom, collected between...
Preprint
Full-text available
Background Throughout the pandemic, research, public health, and policy emphasised prediction and surveillance of excess deaths, which have mostly occurred in older individuals with underlying conditions, highlighting importance of baseline mortality risk, infection rate (IR) and pandemic-related relative risk (RR). We now use national, pre-and pos...
Preprint
Full-text available
Background The clinical value of polygenic risk scores has been questioned. We sought to clarify performance in population screening, individual risk prediction and population risk stratification by analysing 926 polygenic risk scores for 310 diseases from the Polygenic Score (PGS) Catalog. Methods Polygenic risk scores in the PGS Catalog are repo...
Article
Objective: To establish and validate mappings between primary care clinical terminologies (Read Version 2, Clinical Terms Version 3) and Phecodes. Methods: We processed 123,662,421 primary care events from 230,096 UK Biobank (UKB) participants. We assessed the validity of the primary care-derived Phecodes by conducting PheWAS analyses for seven pre...
Article
Full-text available
Background Primary prevention strategies for heart failure(HF) have had limited success, possibly due to a wide range of underlying risk factors(RFs). Systematic evaluations of the prognostic burden and preventive potential across this wide range of risk factors are lacking. Objective To estimate evidence, prevalence and co-occurrence for primary...
Preprint
Full-text available
Background Updatable understanding of the onset and progression of individuals COVID-19 trajectories underpins pandemic mitigation efforts. In order to identify and characterize individual trajectories, we defined and validated ten COVID-19 phenotypes from linked electronic health records (EHR) on a nationwide scale using an extensible framework....
Article
Full-text available
Patients and public have sought mortality risk information throughout the pandemic, but their needs may not be served by current risk prediction tools. Our mixed methods study involved: (1) systematic review of published risk tools for prognosis, (2) provision and patient testing of new mortality risk estimates for people with high-risk conditions...
Article
Background Patients presenting to primary care with chest pain are often not given a cause. Patients with such unattributed chest pain have an increased risk of future cardiovascular disease (CVD) compared to patients with diagnosed non-coronary chest pain. It is unknown whether risk factors for CVD determined in the general population are the same...
Article
Full-text available
Background An Informatics Consult has been proposed in which clinicians request novel evidence from large scale health data resources, tailored to the treatment of a specific patient. However, the availability of such consultations is lacking. We seek to provide an Informatics Consult for a situation where a treatment indication and contraindicatio...