Julius Upmeier zu Belzen’s research while affiliated with Berlin Institute of Health at Charité - Universitätsmedizin Berlin and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (15)


Overview of the study
a The medical history captures encounters with primary and secondary care, including diagnoses, medications, and procedures (ideally) from birth. Here we train a multi-layer perceptron on data before recruitment to predict phenome-wide incident disease onset for 1741 endpoints. b Location and size of the 22 assessment centers of the UK Biobank cohort across England, Wales, and Scotland. c To learn risk states from individual medical histories, the UK Biobank population was partitioned by their respective assessment center at recruitment. d For each of the 22 partitions, the Risk Model was trained to predict phenome-wide incident disease onset for 1741 endpoints. Subsequently, for each endpoint, Cox proportional hazard (CPH) models were developed on the risk states in combination with sets of commonly available predictors to model disease risk. Predictions of the CPH model on the test set were aggregated for downstream analysis. e External validation in the All of US cohort. After mapping to the OMOP vocabulary, we transferred the trained risk model to the All of US cohort and calculated the risk state for all endpoints. To validate these risk states, we compared the unchanged CPH models developed in the UK Biobank with refitted CPH models for age and sex. Source data are provided. The Icons are made by Freepik from www.flaticon.com.
Routine health records stratify phenome-wide disease onset
a Ratio of incident events in the Top 10% compared with the Bottom 10% of the estimated risk states. Event rates in the Top 10% are higher than in the Bottom 10% for all but one of the 1741 investigated endpoints. Red dots indicate 24 selected endpoints detailed in Fig. 2b. To illustrate, 1053 (2.10%) individuals in the top risk decile for cardiac arrest experienced an event compared with only 59 (0.12%) in the bottom decile, with a risk ratio of 17.85. b Incident event rates for each medical history risk percentile (if medical history was available) for a selection of 24 endpoints. c Cumulative event rates with 95% confidence intervals for the Top 1%, median, and Bottom 1% of risk percentiles in (b) over 15ys. Statistical measures were derived from 502,489 individuals. Individuals with prevalent diseases were excluded from the endpoints-specific analysis. Source data are provided.
Discriminative performance indicates potential utility
a Differences in discriminatory performance quantified by the C-Index between CPH models trained on Age + Sex and Age + Sex + MedicalHistory for all 1741 endpoints. We found significant improvements over the baseline model (Age + Sex, age, and biological sex only) for 1546 (88.8%) of the 1741 investigated endpoints. Red dots indicate selected endpoints in Fig. 3b. b Absolute discriminatory performance in terms of C-Index comparing the baseline (Age + Sex, black point) with the added routine health records risk state (Age + Sex + RiskState, red point) for a selection of 24 endpoints. c The direct C-index differences for the same models. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. d Example of individual predicted phenome-wide risk profile. Predisposition (10-year risk estimated by Age + Sex + RiskState compared to risk estimated by Age + Sex alone) is displayed in the inner circle, and absolute 10-year risk estimated by Age + Sex + RiskState can be found in the outer circle. Labels indicate endpoints with a high individual predisposition (>2 times higher than the Age + Sex-based reference estimate) and absolute 10-year risk >10%. e Top 5 highest attributed records for selected endpoints. Statistical measures were derived from 502,489 individuals. Source data are provided.
Predictive models can generalize across healthcare systems and populations
a External validation of the differences in discriminatory performance quantified by the C-Index between CPH models trained on age and biological sex and age, biological sex, and the risk state for 1519 endpoints in the All of Us cohort. We find significant improvements over the baseline model (age and biological sex only) for 1171 (77.1%) of the 1519 investigated endpoints. b Direct comparison of the absolute C-Index in the UK Biobank (x-axis) and the All Of Us cohort (y-axis). Significant improvements can be replicated for 1115 (78.9%, green points) of 1414 endpoints in the All Of Us cohort. c Comparison of mean delta C-Index per delta percentile (derived from the UK Biobank from the 1519 endpoints available in All Of Us). Improvements in the All Of Us cohort are consistent with the UK Biobank cohort: Small improvements in the UK Biobank tend to be larger in All Of Us, while large improvements in the UK Biobank tend to be attenuated in All Of Us. d Distribution of C-Indices for the 1519 investigated endpoints stratified by communities historically underrepresented in biomedical research (UPD)⁷³. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. e For the same groups, confidence intervals for the additive performance as measured by the C-Index compared to the baseline model. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. f Absolute discriminatory performance in terms of C-Index comparing the baseline (age and biological sex, black point) with the added routine health records risk state (red points) for a selection of 24 endpoints. g The differences in C-index for the same models. Statistical measures for UKB in (b and c) were derived from 502,489 individuals, and for AoU in (a–g) were derived from 259,234 individuals. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. Source data are provided.
Predictions can support cardiovascular disease prevention and the response to emerging health threats
a Discriminatory performances in terms of absolute C-Indices comparing risk scores (Age + Sex, SCORE2, ASCVD, and QRISK as indicated, black point) with the risk model based on Age + Sex + RiskState (red segment). b Direct differences between risk scores (Age + Sex, SCORE2, ASCVD, and QRISK as indicated) and the risk model based on Age + Sex + RiskState in terms of C-index. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. c Estimated cumulative event trajectories, including 95% confidence intervals of severe (with hospitalization) and fatal (death registry) COVID-19 outcomes stratified by the Top, Median, and Bottom 5% based on age (left), or risk states of pneumonia, sepsis and all-cause mortality as estimated by Kaplan-Meier analysis. Statistical measures were derived from 502,489 individuals. Source data are provided.

+1

Medical history predicts phenome-wide disease onset and enables the rapid response to emerging health threats
  • Article
  • Full-text available

January 2025

·

55 Reads

·

1 Citation

Jakob Steinfeldt

·

·

Thore Buergel

·

[...]

·

Roland Eils

The COVID-19 pandemic exposed a global deficiency of systematic, data-driven guidance to identify high-risk individuals. Here, we illustrate the utility of routinely recorded medical history to predict the risk for 1741 diseases across clinical specialties and support the rapid response to emerging health threats such as COVID-19. We developed a neural network to learn from health records of 502,489 UK Biobank participants. Importantly, we observed discriminative improvements over basic demographic predictors for 1546 (88.8%) endpoints. After transferring the unmodified risk models to the All of US cohort, we replicated these improvements for 1115 (78.9%) of 1414 investigated endpoints, demonstrating generalizability across healthcare systems and historically underrepresented groups. Ultimately, we showed how this approach could have been used to identify individuals vulnerable to severe COVID-19. Our study demonstrates the potential of medical history to support guidance for emerging pandemics by systematically estimating risk for thousands of diseases at once at minimal cost.

Download

An open-source framework for end-to-end analysis of electronic health record data

September 2024

·

145 Reads

·

4 Citations

Nature Medicine

With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here we introduce ehrapy, a modular open-source Python framework designed for exploratory analysis of heterogeneous epidemiology and EHR data. ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models, paving the way for foundational models in biomedical research. We demonstrate ehrapy’s features in six distinct examples. We applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we reveal biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. We reconstructed disease state trajectories in patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we conducted a case study to demonstrate how ehrapy can detect and mitigate biases in EHR data. ehrapy, thus, provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.


Overview of the study
a The medical history captures encounters with primary and secondary care, including diagnoses, medications, and procedures (ideally) from birth. Here we train a multi-layer perceptron on data before recruitment to predict phenome-wide incident disease onset for 1883 endpoints. b Location and size of the 22 assessment centers of the UK Biobank cohort across England, Wales, and Scotland. c To learn risk states from individual medical histories, the UK Biobank population was partitioned by their respective assessment center at recruitment. d For each of the 22 partitions, the Risk Model was trained to predict phenome-wide incident disease onset for 1883 endpoints. Subsequently, for each endpoint, Cox proportional hazard (CPH) models were developed on the risk states in combination with sets of commonly available predictors to model disease risk. Predictions of the CPH model on the test set were aggregated for downstream analysis. e External validation in the All of US cohort. After mapping to the OMOP vocabulary, we transferred the trained risk model to the All of US cohort and calculated the risk state for all endpoints. To validate these risk states, we compared the unchanged CPH models developed in the UK Biobank with refitted CPH models for age and sex. Source data are provided. The Icons are made by Freepik from www.flaticon.com.
Routine health records stratify phenome-wide disease onset
a Ratio of incident events in the Top 10% compared with the Bottom 10% of the estimated risk states. Event rates in the Top 10% are higher than in the Bottom 10% for all but one of the 1883 investigated endpoints. Red dots indicate 24 selected endpoints detailed in Fig. 2b. To illustrate, 1198 (2.39%) individuals in the top risk decile for cardiac arrest experienced an event compared with only 30 (0.06%) in the bottom decile, with a risk ratio of 39.93. b Incident event rates for each medical history risk percentile (if medical history was available) for a selection of 24 endpoints. c Cumulative event rates with 95% confidence intervals for the Top 1%, median, and Bottom 1% of risk percentiles in b) over 15ys. Statistical measures were derived from 502.460 individuals. Individuals with prevalent diseases were excluded from the endpoints-specific analysis. Source data are provided.
Discriminative performance indicates potential utility
a Differences in discriminatory performance quantified by the C-Index between CPH models trained on Age+Sex and Age+Sex+MedicalHistory for all 1883 endpoints. We found significant improvements over the baseline model (Age+Sex, age, and biological sex only) for 1774 (94.2%) of the 1883 investigated endpoints. Red dots indicate selected endpoints in Fig. 3b. b Absolute discriminatory performance in terms of C-Index comparing the baseline (Age+Sex, black point) with the added routine health records risk state (Age+Sex+RiskState, red point) for a selection of 24 endpoints. c The direct C-index differences for the same models. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. d Example of individual predicted phenome-wide risk profile. Predisposition (10-year risk estimated by Age+Sex+RiskState compared to risk estimated by Age+Sex alone) is displayed in the inner circle, and absolute 10-year risk estimated by Age+Sex+RiskState can be found in the outer circle. Labels indicate endpoints with a high individual predisposition (>2 times higher than the Age+Sex-based reference estimate) and absolute 10-year risk > 10%. e Top 5 highest attributed records for selected endpoints. Statistical measures were derived from 502.460 individuals. Source data are provided.
Predictive models can generalize across healthcare systems and populations
a External validation of the differences in discriminatory performance quantified by the C-Index between CPH models trained on age and biological sex and age, biological sex, and the risk state for 1.568 endpoints in the All of Us cohort. We find significant improvements over the baseline model (age and biological sex only) for 1.347 (85.9%) of the 1.568 investigated endpoints. b Direct comparison of the absolute C-Index in the UK Biobank (x-axis) and the All Of Us cohort (y-axis). Significant improvements can be replicated for 1347 (89.8%, green points) of 1500 endpoints in the All Of Us cohort. c Comparison of mean delta C-Index per delta percentile (derived from the UK Biobank from the 1.568 endpoints available in All Of Us). Improvements in the All Of Us cohort are consistent with the UK Biobank cohort: Small improvements in the UK Biobank tend to be larger in All Of Us, while large improvements in the UK Biobank tend to be attenuated in All Of Us. d Distribution of C-Indices for the 1.568 investigated endpoints stratified by communities historically underrepresented in biomedical research (UPD)⁷³. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. e For the same groups, confidence intervals for the additive performance as measured by the C-Index compared to the baseline model. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. f Absolute discriminatory performance in terms of C-Index comparing the baseline (age and biological sex, black point) with the added routine health records risk state (red points) for a selection of 24 endpoints. g The differences in C-index for the same models. Statistical measures for UKB (in b and c))were derived from 502.460 individuals and for AoU (in a–g) were derived from 229.830 individuals. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. Source data are provided.
Predictions can support cardiovascular disease prevention and the response to emerging health threats
a Discriminatory performances in terms of absolute C-Indices comparing risk scores (Age+Sex, SCORE2, ASCVD, and QRISK as indicated, black point) with the risk model based on Age+Sex+RiskState (red segment). b Direct differences between risk scores (Age+Sex, SCORE2, ASCVD, and QRISK as indicated) and the risk model based on Age+Sex+RiskState in terms of C-index. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. c Estimated cumulative event trajectories, including 95% confidence intervals of severe (with hospitalization) and fatal (death registry) COVID-19 outcomes stratified by the Top, Median, and Bottom 5% based on age (left) or risk states of pneumonia, sepsis, and all-cause mortality as estimated by Kaplan-Meier analysis. Statistical measures were derived from 502.460 individuals. Source data are provided.
RETRACTED ARTICLE: Medical history predicts phenome-wide disease onset and enables the rapid response to emerging health threats

May 2024

·

131 Reads

·

3 Citations

The COVID-19 pandemic exposed a global deficiency of systematic, data-driven guidance to identify high-risk individuals. Here, we illustrate the utility of routinely recorded medical history to predict the risk for 1883 diseases across clinical specialties and support the rapid response to emerging health threats such as COVID-19. We developed a neural network to learn from health records of 502,460 UK Biobank. Importantly, we observed discriminative improvements over basic demographic predictors for 1774 (94.3%) endpoints. After transferring the unmodified risk models to the All of US cohort, we replicated these improvements for 1347 (89.8%) of 1500 investigated endpoints, demonstrating generalizability across healthcare systems and historically underrepresented groups. Ultimately, we showed how this approach could have been used to identify individuals vulnerable to severe COVID-19. Our study demonstrates the potential of medical history to support guidance for emerging pandemics by systematically estimating risk for thousands of diseases at once at minimal cost.


A predictive atlas of disease onset from retinal fundus photographs

March 2024

·

131 Reads

Early detection of high-risk individuals is crucial for healthcare systems to cope with changing demographics and an ever-increasing patient population. Images of the retinal fundus are a non- invasive, low-cost examination routinely collected and potentially scalable beyond ophthalmology. Prior work demonstrated the potential of retinal images for risk assessment for common cardiometabolic diseases, but it remains unclear whether this potential extends to a broader range of human diseases. Here, we extended a retinal foundation model (RETFound) to systematically explore the predictive potential of retinal images as a low-cost screening strategy for disease onset across >750 incident diseases in >60,000 individuals. For more than a third (n=308) of the diseases, we demonstrated improved discriminative performance compared to readily available patient characteristics. This included 281 diseases outside of ophthalmology, such as type 2 diabetes (Delta C-Index: UK Biobank +0.073 (0.068, 0.079)) or chronic obstructive pulmonary disease (Delta C-Index: UK Biobank +0.047 (0.039, 0.054)), showcasing the potential of retinal images to complement screening strategies more widely. Moreover, we externally validated these findings in 7,248 individuals from the EPIC-Norfolk Eye Study. Notably, retinal information did not improve the prediction for the onset of cardiovascular diseases compared to established primary prevention scores, demonstrating the need for rigorous benchmarking and disease-agnostic efforts to design cost-efficient screening strategies to improve population health. We demonstrated that predictive improvements were attributable to retinal vascularisation patterns and less obvious features, such as eye colour or lens morphology, by extracting image attributions from risk models and performing genome-wide association studies, respectively. Genetic findings further highlighted commonalities between eye-derived risk estimates and complex disorders, including novel loci, such as IMAP1, for iron homeostasis. In conclusion, we present the first comprehensive evaluation of predictive information derived from retinal fundus photographs, illustrating the potential and limitations of easily accessible and low-cost retinal images for risk assessment across common and rare diseases.


Exploratory electronic health record analysis with ehrapy

December 2023

·

94 Reads

With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here, we introduce ehrapy, a modular open-source Python framework designed for exploratory end-to-end analysis of heterogeneous epidemiology and electronic health record data. Ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference, and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models paving the way for foundational models in biomedical research. We demonstrated ehrapys features in five distinct examples: We first applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we revealed biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. Finally, we reconstructed disease state trajectories in SARS-CoV-2 patients based on imaging data. Ehrapy thus provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.


Figure 2: Routine health records stratify phenome-wide disease onset: a) Ratio of incident events in the Top 10% compared with the Bottom 10% of the estimated risk states. Event rates in the Top 10% are higher than in the Bottom 10% for all 1,883 investigated endpoints. Red dots indicate 24 selected endpoints detailed in Fig 2B. To illustrate, 1,238 (2.49%) individuals in the top risk decile for cardiac arrest experienced an event compared with only 29 (0.06%) in the bottom decile,
Figure 3: Discriminative performance indicates potential utility: a) Differences in discriminatory performance quantified by the C-Index between CPH models trained on Age+Sex and Age+Sex+RiskState for all 1,883 endpoints. We find significant improvements over the baseline model (Age+Sex, age, and biological sex only) for 1800 (95.6%) of the 1,883 investigated endpoints. Red dots indicate selected endpoints in Fig. 3b. b) Absolute discriminatory performance in terms of C-Index comparing the baseline (Age+Sex, black point) with the added routine health records risk state (Age+Sex+RiskState, red point) for a selection of 24 endpoints. c) The direct C-index differences for the same models. Dots indicate medians and whiskers extend to the 95% confidence interval for a distribution bootstrapped over 100 iterations. d) Example of individual predicted phenome-wide risk profile. Predisposition (10-year risk estimated by Age+Sex+RiskState compared to risk estimated by Age+Sex alone) is displayed in the inner circle, and absolute 10-year risk estimated by Age+Sex+RiskState can be found in the outer circle. Labels indicate endpoints with a high individual predisposition (> 2 times higher than the Age+Sex-based reference estimate) and absolute 10-year risk > 10%. e) Top 5 highest attributed records for selected endpoints.
Figure 4: Predictive models can generalize across health care systems and populations: a) External validation of the differences in discriminatory performance quantified by the C-Index between CPH models trained on age and biological sex and age, biological sex and the risk state for 1.568 endpoints in the All of Us cohort. We find significant improvements over the baseline model (age and biological sex only) for 1.310 (83.5%) of the 1.658 investigated endpoints. b) Absolute discriminatory performance in terms of C-Index comparing the baseline (age and biological sex, black point) with the added routine health records risk state (red point) for a selection of 24 endpoints. c) The differences in C-index for the same models. d) Distribution of C-Indices for the 1.658 investigated endpoints stratified by communities historically underrepresented in biomedical research (UPD). e) For the same groups, confidence intervals for the additive performance as measured by the CIndex compared to the baseline model.
Medical history predicts phenome-wide disease onset

March 2023

·

271 Reads

·

1 Citation

The COVID-19 pandemic exposed, with few exceptions, a global deficiency in delivering systematic, data-driven guidance to protect citizens and coordinate vaccination programs. At the same time, medical histories are routinely recorded in most healthcare systems and are instantly available for risk assessment. Here, we demonstrate the utility of medical history in determining the risk for 1,883 diseases across clinical specialties and facilitating the rapid response to emerging health threats at the example of COVID-19. We developed a neural network to learn disease-specific risk states from routinely collected health records of 502,460 UK Biobank participants, demonstrating risk stratification for nearly all conditions, and validated this model on 229,830 individuals from the All of US cohort. When integrated into Cox Proportional Hazard Models, we observed significant discriminative improvements over basic demographic predictors for 1,774 (94.3%). After transferring the unmodified risk models to the All of US cohort, the discriminate improvements were replicated for 1,347 (89.8%) of 1,500 investigated endpoints, demonstrating model generalizability across healthcare systems and historically underrepresented groups. We then show that these risk states can be used to identify individuals vulnerable to severe COVID-19 and mortality. Our study demonstrates the currently underused potential of medical history to rapidly respond to emerging health threats by systematically estimating risk for thousands of diseases at once at minimal cost.


Study overview
a, To learn metabolomic states from circulating blood metabolites, the eligible UK Biobank population (with NMR blood metabolomics and valid consent) was split into training, validation and test sets with 22-fold nested cross-validation based on the assigned UK Biobank assessment center. b, For each of the 22 partitions, the metabolomic state model was trained on the 168 metabolomic markers to predict metabolomic risk against 24 common disease endpoints. Subsequently, for each endpoint, CPH models were developed on the metabolomic state in combination with sets of commonly available clinical predictors to model disease risk. Predictions of the CPH model on the test set were aggregated for downstream analysis. c, The metabolomic state model was externally validated in four independent cohorts—the Whitehall II cohort and three from the BBMRI-NL consortium: the Rotterdam Study, the Leiden Longevity Study and the PROSPER cohort. d, In this study we consider clinical predictors from scores commonly applied in primary prevention. We additionally integrate variables into a comprehensive predictor set (PANEL) to investigate overlapping information with the metabolomic state. FH, family history.
Metabolomic state is associated with ORs and stratifies survival
a, Observed event frequency for incident disease plotted against metabolomic state percentiles over the entire study population for all 24 endpoints. b, Cumulative event rates over the observation time for all assessed endpoints, stratified by metabolomic state quantiles (light blue, bottom 10%; blue, median 10%; dark blue, top 10%), with 95% CIs indicated. PAD, peripheral artery disease.
Predictive value of the metabolomic state is endpoint dependent
a, Comparison of discriminative performance of CPH models trained on the metabolomic state only (MET), the three clinical predictor sets (Age+Sex, ASCVD and PANEL) and the sets’ combinations with the metabolomic state. Horizontal dashed lines indicate the median performance of the three clinical predictor sets. b, Differences in discriminative performance between the Age+Sex baseline (dashed line), metabolomic state only (blue) and the combination of Age+Sex and metabolomic state (green). c, Differences in discriminative performance between ASCVD predictors (dashed line), the combination of Age+Sex and the metabolomic state (green) and the combination of metabolomic state and ASCVD predictors (red). d, Difference in discriminative performance between comprehensive PANEL predictors (dashed line), ASCVD + MET (red) and PANEL + MET (black). a–d, Statistical measures were derived from n = 117.981 individuals; those with previous events were excluded (Supplementary Table 1). Data are presented as median (center of error bar) and 95% CI (line of error bar) determined by bootstrapping of with 1,000 iterations. b–d, The x-axis range differs across panels; vertical grid lines indicate differences of 0.02 C-index.
Model calibration and additive predictive value of the metabolomic state translate to potential clinical utility
a–c, Calibration curves for CPH models, including baseline parameter sets Age+Sex, ASCVD and PANEL, as well as their combinations with the metabolomic state (Age+Sex + MET) for the endpoints T2D (a), dementia (b) and heart failure (c). d–f, Endpoint-specific net benefit curves standardized by endpoint prevalence, where horizontal solid gray lines indicate ‘treat none’ and vertical solid gray lines indicate ‘treat all’; T2D (d), dementia (e) and heart failure (f). The standardized net benefits of sets Age+Sex, ASCVD and PANEL are compared with Age+Sex + MET and additional non-laboratory predictors of PANEL (PANELnoLaboratory). Green and blue color-filled areas indicate the added benefit of the combination of the metabolomic state and Age+Sex and PANELnoLaboratory, respectively. g–i, Standardized net benefit curves comparing the performance of PANEL + MET against baselines Age+Sex, ASCVD and PANEL; T2D (g), dementia (h) and heart failure (i). Decision curves were derived from n = 111,745 (T2D), n = 117,245 (dementia) and n = 113,636 (heart failure) individuals.
Analysis of the metabolomic state informs on metabolite profiles associated with disease risk
a, Heatmap showing the importance of metabolites in regard to the estimated metabolomic states, represented by absolute global SHAP value estimates per endpoint for the 75 globally most important metabolites. Endpoints are sorted by the discriminative performance of the metabolomic state (left to right; Fig. 3a). b, Global metabolite attributions for T2D; individual attributions are aggregated by percentiles and each dot indicates one percentile. The more distant a dot from the circular baseline, the stronger the absolute attribution for that percentile. Deviations toward the center and periphery represent negative and positive contributions, respectively, to the metabolomic state. Colors indicate the metabolite’s mean plasma value. c, Global metabolite attributions for all-cause dementia. IDL, intermediate-density lipoprotein.
Metabolomic profiles predict individual multidisease outcomes

September 2022

·

486 Reads

·

202 Citations

Nature Medicine

Risk stratification is critical for the early identification of high-risk individuals and disease prevention. Here we explored the potential of nuclear magnetic resonance (NMR) spectroscopy-derived metabolomic profiles to inform on multidisease risk beyond conventional clinical predictors for the onset of 24 common conditions, including metabolic, vascular, respiratory, musculoskeletal and neurological diseases and cancers. Specifically, we trained a neural network to learn disease-specific metabolomic states from 168 circulating metabolic markers measured in 117,981 participants with ~1.4 million person-years of follow-up from the UK Biobank and validated the model in four independent cohorts. We found metabolomic states to be associated with incident event rates in all the investigated conditions, except breast cancer. For 10-year outcome prediction for 15 endpoints, with and without established metabolic contribution, a combination of age and sex and the metabolomic state equaled or outperformed established predictors. Moreover, metabolomic state added predictive information over comprehensive clinical variables for eight common diseases, including type 2 diabetes, dementia and heart failure. Decision curve analyses showed that predictive improvements translated into clinical utility for a wide range of potential decision thresholds. Taken together, our study demonstrates both the potential and limitations of NMR-derived metabolomic profiles as a multidisease assay to inform on the risk of many common diseases simultaneously.


THER-01. Precision brain tumor therapy by AAV-mediated oncogene editing

June 2022

·

40 Reads

Neuro-Oncology

Pediatric high-grade glioma is a heterogeneous group of highly malignant tumors of the central nervous system, with a median overall survival of less than two years after diagnosis, demanding novel treatment options. One innovative approach is gene therapy, which has so far been hampered for cancer treatment owing to the lack of a system targeting tumor cells specifically. To overcome this limitation, we established a novel strategy for gene therapy, combining tumor cell-specific adeno-associated virus (AAV) variants with oncogene-specific CRISPR-Cas nucleases. We screened 177 different Cas9/gRNA combinations targeting the genes encoding H3K27M or BRAFV600E, and identified highly specific nucleases that edited the oncogenic allele but left the respective WT loci intact, which we validated by PCR amplicon sequencing. Next, we intravenously injected an AAV library engineered to encode its own capsid DNA into mice harboring patient-derived xenograft tumors driven by H3K27M or BRAFV600E. After 21 days, we resected neoplasms and separated mCherry-labeled tumor cells from normal surrounding cells by fluorescence-activated cell sorting. Using the DNA from tumor cells as template, we generated a second AAV library, which was utilized in another round of in vivo selection. At the end of each screen, DNA from tumor cells, surrounding cells, and control tissues (liver and spleen) was analyzed by amplicon sequencing. Strikingly, we identified multiple AAV variants that were highly and recurrently enriched in the analyzed tumor tissues. We are currently validating these variants by intravenously injecting selected, GFP-encoding AAVs to tumor-bearing mice and by subsequently analyzing their distribution throughout the aforementioned tissues. We will combine oncogene-specific nucleases with these validated AAV variants and analyze their anti-tumoral efficacy in a preclinical setting. Furthermore, we plan to adapt this approach to allografted mice, evaluating its feasibility and efficacy in syngeneic models.


Figure 1: Selection and characteristics of study population (A) Individuals in the UK Biobank population who withdrew consent, with missing information about their sex or with earlier records of incident myocardial infarction or stroke or lipid-lowering treatment at baseline were excluded. The remaining set was split into training, validation, and test sets in 22-fold nested cross-validation based on the assigned UK Biobank assessment centre. (B) Distribution of observation times for the derived study population. The median observation time was 11·7 years (IQR 11·0-12·3). (C) Kaplan-Meier estimates for the disease-free survival function stratified by sex. (D) Numbers at risk in 5-year intervals stratified by sex.
appendix pp 11, 15). Although we observed improvements in discriminative performance for the Cox model after addition of the PGSs as well, the NeuralCVD model remained superior in C­index (COX plus PGS 0·002, 95% CI 0·002-0·003; COX plus PGS*age 0·002, 0·002-0·003) and NRI (COX plus PGS 0·0424, 95% CI 0·0383-0·0464; COX plus PGS*age 0·0359,
Neural network-based integration of polygenic and clinical information: development and validation of a prediction model for 10-year risk of major adverse cardiac events in the UK Biobank cohort

February 2022

·

255 Reads

·

33 Citations

The Lancet Digital Health

Background In primary cardiovascular disease prevention, early identification of high-risk individuals is crucial. Genetic information allows for the stratification of genetic predispositions and lifetime risk of cardiovascular disease. However, towards clinical application, the added value over clinical predictors later in life is crucial. Currently, this genotype–phenotype relationship and implications for overall cardiovascular risk are unclear. Methods In this study, we developed and validated a neural network-based risk model (NeuralCVD) integrating polygenic and clinical predictors in 395 713 cardiovascular disease-free participants from the UK Biobank cohort. The primary outcome was the first record of a major adverse cardiac event (MACE) within 10 years. We compared the NeuralCVD model with both established clinical scores (SCORE, ASCVD, and QRISK3 recalibrated to the UK Biobank cohort) and a linear Cox-Model, assessing risk discrimination, net reclassification, and calibration over 22 spatially distinct recruitment centres. Findings The NeuralCVD score was well calibrated and improved on the best clinical baseline, QRISK3 (ΔConcordance index [C-index] 0·01, 95% CI 0·009–0·011; net reclassification improvement (NRI) 0·0488, 95% CI 0·0442–0·0534) and a Cox model (ΔC-index 0·003, 95% CI 0·002–0·004; NRI 0·0469, 95% CI 0·0429–0·0511) in risk discrimination and net reclassification. After adding polygenic scores we found further improvements on population level (ΔC-index 0·006, 95% CI 0·005–0·007; NRI 0·0116, 95% CI 0·0066–0·0159). Additionally, we identified an interaction of genetic information with the pre-existing clinical phenotype, not captured by conventional models. Additional high polygenic risk increased overall risk most in individuals with low to intermediate clinical risk, and age younger than 50 years. Interpretation Our results demonstrated that the NeuralCVD score can estimate cardiovascular risk trajectories for primary prevention. NeuralCVD learns the transition of predictive information from genotype to phenotype and identifies individuals with high genetic predisposition before developing a severe clinical phenotype. This finding could improve the reprioritisation of otherwise low-risk individuals with a high genetic cardiovascular predisposition for preventive interventions. Funding Charité–Universitätsmedizin Berlin, Einstein Foundation Berlin, and the Medical Informatics Initiative.


Figure 1. Engineering of CASANOVA-C3, a light-switchable anti-CRISPR protein for optogenetic control of NmeCas9. (A) Schematic of CASANOVA-C3 function. (B) Structure of AcrIIC3. The nine regions chosen for LOV2 domain insertion (R1-R9) are shown in red (PDB 6J9N). (C) Luciferase reporter-based screen of AcrIIC3-LOV2 hybrids. HEK293T cells were co-transfected with vectors encoding (i) a firefly luciferase reporter, (ii) NmeCas9 and a sgRNA targeting the luciferase reporter and (iii) either wild-type AcrIIC3 (AcrIIC3) or the indicated AcrIIC3-LOV2 hybrid (S11-V100) followed by luciferase assay. The AcrIIC3 residues behind which the LOV2 domain was inserted are indicated. R1-9 correspond to the different regions in B. R, region. Rep, reporter only control. The lead region is labelled in bold. (D) Lead panel of AcrIIC3-LOV2 hybrids. Glycine-serine linkers are in green. (E) Luciferase assay screen of the AcrIIC3-LOV2 hybrids in D. Cells were transfected as in C and then exposed to blue light or kept in the dark for 48 hours, followed by luciferase assay. Rep, reporter only control. (F) HEK293T cells were co-transfected with vectors encoding (i) NmeCas9 and a sgRNA targeting the endogenous IL2RG locus and (ii) the indicated Acr variant in D. Samples were exposed to blue light or kept in the dark for 72 h. Gene editing was assessed by T7 assay. Representative gel images are shown below the bar charts. The dotted line separates different gels. In, input. T7, T7 cleavage fragments. (C, E, F) Bars represent mean values, error bars the standard deviation and dots individual data points from n = 3 independent experiments.
Figure 2. Light-dependent genome editing. (A) Experimental workflow. (B, C) HEK293T cells were co-transduced with AAV vectors (B) or co-transfected with plasmids (C) encoding (i) NmeCas9 and a sgRNA targeting the indicated locus and (ii) the indicated Acr variant. Cells were then irradiated with pulsed blue light or kept in the dark for 72 hours, followed by assessment of indel frequencies using NGS (B) or TIDE sequencing (C). Plasmid mass ratios used during the transfection in C are indicated. (D) Huh7 cells were co-transduced with AAV vectors encoding (i) NmeCas9 and a sgRNA targeting the indicated locus and (ii) the respective Acr. Seventy-two hours post-transduction, editing frequencies were determined by TIDE sequencing. (B-D) Bars represent mean values, error bars the standard deviation and dots individual data points from n = 3 independent experiments.
Figure 3. The LOV2 domain in CN-C3 is located in close proximity to the NmeCas9 binding surface. (A) Analysis of AcrIIC3 residue contacts. Spatially proximate AcrIIC3 residue pairs (distance < 7 ˚ A) are indicated by black squares (triangular plot). -Helices and -sheets, indicated by cylinders and arrows according to the published structure (PDB: 6J9N) are shown on the left. Regions into which the LOV2 domain was inserted into AcrIIC3 (see Figure 1B) are indicated in red and correspond to the labelled regions in the AcrIIC3 structure (lower right). The LOV2 insertion site underlying CN-C3(G) is marked in green. Numbers correspond to AcrIIC3 residues. (B) Close-up view on the identified LOV2 insertion site in context of the AcrIIC3:HNH domain complex. The approximate distance between the insertion site on AcrIIC3 and the NmeCas9 HNH domain is indicated. The angle as well as the distance between the secondary structure elements adjacent to the insertion site are shown. Residues in red mediate direct contact to the HNH domain. (C) Computational model of CN-C3 generated by domain assembly simulation. The three most populated conformational clusters of the LOV2 are shown in purple in descending order. (D) Cluster 3 does not sterically clash with the HNH-domain. PDB 6J9N, 2V0W.
Optogenetic control of Neisseria meningitidis Cas9 genome editing using an engineered, light-switchable anti-CRISPR protein

December 2020

·

195 Reads

·

34 Citations

Nucleic Acids Research

Optogenetic control of CRISPR–Cas9 systems has significantly improved our ability to perform genome perturbations in living cells with high precision in time and space. As new Cas orthologues with advantageous properties are rapidly being discovered and engineered, the need for straightforward strategies to control their activity via exogenous stimuli persists. The Cas9 from Neisseria meningitidis (Nme) is a particularly small and target-specific Cas9 orthologue, and thus of high interest for in vivo genome editing applications. Here, we report the first optogenetic tool to control NmeCas9 activity in mammalian cells via an engineered, light-dependent anti-CRISPR (Acr) protein. Building on our previous Acr engineering work, we created hybrids between the NmeCas9 inhibitor AcrIIC3 and the LOV2 blue light sensory domain from Avena sativa. Two AcrIIC3-LOV2 hybrids from our collection potently blocked NmeCas9 activity in the dark, while permitting robust genome editing at various endogenous loci upon blue light irradiation. Structural analysis revealed that, within these hybrids, the LOV2 domain is located in striking proximity to the Cas9 binding surface. Together, our work demonstrates optogenetic regulation of a type II-C CRISPR effector and might suggest a new route for the design of optogenetic Acrs.


Citations (10)


... creasing precision across diverse modalities. Structured EHR data, encoded through standardized medical codes, support a wide range of applications, from personalized risk prediction (Goldstein et al., 2016;Yu et al., 2024b) and disease trajectory modeling (Jensen et al., 2017;Heumos et al., 2024) to emulation of clinical trials (Katsoulakis et al., 2024;Kraljevic et al., 2024). The cornerstone of structured EHRs is medical coding systems, which assign standardized alphanumeric codes to various aspects of patient health, including diseases, procedures, medications, and laboratory tests. ...

Reference:

Multimodal Medical Code Tokenizer
An open-source framework for end-to-end analysis of electronic health record data

Nature Medicine

... Moreover, the integration of AI with electronic health records (EHRs) provides a comprehensive overview of patient health histories, enhancing diagnostic accuracy and treatment efficacy [8]. AI-driven analysis of EHRs can identify hidden patterns across patient populations, predict disease progression, and suggest preventative measures tailored to individual health profiles [9]. This holistic approach not only improves individual patient outcomes but also contributes to broader public health strategies. ...

RETRACTED ARTICLE: Medical history predicts phenome-wide disease onset and enables the rapid response to emerging health threats

... We train all EHR models on UK Biobank data using 4-fold cross-validation, using a similar evaluation strategy to Steinfeldt et al. 27 . We created four train/test splits with disjoint test sets where approximately 75% of the data are training and 25% of the data are test sets; we split folds along assessment centers to reduce bias and data leakage due to regional stratification and assessment personnel & equipment. ...

Medical history predicts phenome-wide disease onset

... Omics sciences represent a very promising instrument to perform the analysis of patients and their biological characteristics within the dynamic context of disease evolution, thus enabling the molecular characterization of a disease onset and evolution, and providing insight into individual susceptibility to drug treatments [5][6][7][8][9][10]. Given these premises, metabolomics and lipoproteomics present themselves as compelling approaches for investigating alterations of multiple biochemical networks throughout the entire course of AD [11][12][13][14][15][16][17][18]. ...

Metabolomic profiles predict individual multidisease outcomes

Nature Medicine

... Importantly, our approach, based on routine health records, shows large discriminative improvements for the majority of diseases compared with conventionally tested biomarkers [55][56][57] and can be generalized across diverse health systems, populations, and ethnicities. However, we also see that including the medical history over age and sex deteriorated the performance for a subset of 0.7% (UK Biobank) and 5.5% (All Of Us cohort), respectively. ...

Neural network-based integration of polygenic and clinical information: development and validation of a prediction model for 10-year risk of major adverse cardiac events in the UK Biobank cohort

The Lancet Digital Health

... Toward unleashing the full potential of the Acr regulatory layer for CRISPR-Cas control, we previously developed the CASANOVA (CRISPR-Cas activity switching via a novel, optogenetic variant of Acr) concept, which facilitates switching off the activity of Acrs with blue light, thus releasing Cas9 activity (14,39). CASANOVA employs the light-oxygen-voltage 2 (LOV2) domain from Avena sativa (As) ...

Optogenetic control of Neisseria meningitidis Cas9 genome editing using an engineered, light-switchable anti-CRISPR protein

Nucleic Acids Research

... (2) blocking target DNA entry by occupying the PAM binding site (e.g., AcrIIA2, AcrIIA4 and AcrIIC5) [22,29,34,35]; (3) interacting with the Cas9 catalytic domain (e.g., AcrIIC1 and AcrIIC3) [36][37][38] to inhibit its function; and (4) other unclear CRISPR inhibitory mechanisms, such as a dimerized AcrIIA1 with a nuclear acid binding affinity [39]. ...

Computational design of anti-CRISPR proteins with improved inhibition potency

Nature Chemical Biology

... [90] Complementary, we and others showed that the exact position within the loop region can significantly affect the light switch. [91,92] Hence, evaluation of several, different insertion variants followed by systematic optimization is usually necessary to obtain a potent light switch. ...

Optogenetic control of Neisseria meningitidis Cas9 genome editing using an engineered, light-switchable anti-CRISPR protein

... We also observed many parallels between AcrIIA1 and AcrIIC1, another Type II-C Cas9 inhibitor. AcrIIC1 binds to the Cas9 HNH domain with strong affinity (K D = 6.3 nM; Harrington et al., 2017), but is a rather weak anti-CRISPR in comparison to AcrIIC3-5 (Lee et al., 2018;Mathony et al., 2019). In contrast to the narrow-spectrum DNA binding inhibitors AcrIIC3-5, Ac-rIIC1 blocks a broad spectrum of Type II-C orthologs by directly binding Cas9 (Apo or gRNA-bound) via the HNH domain (Harrington et al., 2017). ...

Computational design of anti-CRISPR proteins with improved inhibition potency and expanded specificity

... A residual neural network, DeeProtein, is designed for classifying protein sequences across multiple labels using one-dimensional convolutional techniques. Additionally, it introduces a novel sensitivity analysis method to effectively utilize its inherent knowledge for protein characterization and engineering purposes [238]. To achieve high precision in contact prediction, particularly with low numbers of effective sequences, DeepCov enhances protein engineering by using fully convolutional neural networks on amino acid pair frequency or covariance data from sequence alignments [239]. ...

Leveraging implicit knowledge in neural networks for functional dissection and engineering of proteins

Nature Machine Intelligence