Roland Eils’s research while affiliated with Berlin Institute of Health at Charité - Universitätsmedizin Berlin and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (13)


Fig. 3 Scaling Behavior of Models. Number of model parameters (x-axis) and macro averaged area under receiver operating characteristic curve (AUROC) performance and 95% confidence intervals across all four task groups (y-axis). LLMs with more parameters show an increased performance. The specialized EHR foundation model, CLMBR-T-Base, is the most efficient prediction model.
Fig. 6 AUPRC Performance in Few-Shot Settings. Macro averaged area under the precisionrecall curve (AUPRC) performance across subtasks for four task groups across (bold). Blurred lines are averaged AUPRC values across five bootstrapped runs using different seeds [30].
Large Language Models are Powerful EHR Encoders
  • Preprint
  • File available

February 2025

·

3 Reads

·

Georg von Arnim

·

Tillmann Rheude

·

[...]

·

Electronic Health Records (EHRs) offer rich potential for clinical prediction, yet their inherent complexity and heterogeneity pose significant challenges for traditional machine learning approaches. Domain-specific EHR foundation models trained on large collections of unlabeled EHR data have demonstrated promising improvements in predictive accuracy and generalization; however, their training is constrained by limited access to diverse, high-quality datasets and inconsistencies in coding standards and healthcare practices. In this study, we explore the possibility of using general-purpose Large Language Models (LLMs) based embedding methods as EHR encoders. By serializing patient records into structured Markdown text, transforming codes into human-readable descriptors, we leverage the extensive generalization capabilities of LLMs pretrained on vast public corpora, thereby bypassing the need for proprietary medical datasets. We systematically evaluate two state-of-the-art LLM-embedding models, GTE-Qwen2-7B-Instruct and LLM2Vec-Llama3.1-8B-Instruct, across 15 diverse clinical prediction tasks from the EHRSHOT benchmark, comparing their performance to an EHRspecific foundation model, CLIMBR-T-Base, and traditional machine learning baselines. Our results demonstrate that LLM-based embeddings frequently match or exceed the performance of specialized models, even in few-shot settings, and that their effectiveness scales with the size of the underlying LLM and the available context window. Overall, our findings demonstrate that repurposing LLMs for EHR encoding offers a scalable and effective approach for clinical prediction, capable of overcoming the limitations of traditional EHR modeling and facilitating more interoperable and generalizable healthcare applications.

Download

Author Correction: Medical history predicts phenome-wide disease onset and enables the rapid response to emerging health threats

February 2025

·

21 Reads


Medical history predicts phenome-wide disease onset and enables the rapid response to emerging health threats

January 2025

·

66 Reads

·

2 Citations

The COVID-19 pandemic exposed a global deficiency of systematic, data-driven guidance to identify high-risk individuals. Here, we illustrate the utility of routinely recorded medical history to predict the risk for 1741 diseases across clinical specialties and support the rapid response to emerging health threats such as COVID-19. We developed a neural network to learn from health records of 502,489 UK Biobank participants. Importantly, we observed discriminative improvements over basic demographic predictors for 1546 (88.8%) endpoints. After transferring the unmodified risk models to the All of US cohort, we replicated these improvements for 1115 (78.9%) of 1414 investigated endpoints, demonstrating generalizability across healthcare systems and historically underrepresented groups. Ultimately, we showed how this approach could have been used to identify individuals vulnerable to severe COVID-19. Our study demonstrates the potential of medical history to support guidance for emerging pandemics by systematically estimating risk for thousands of diseases at once at minimal cost.


An open-source framework for end-to-end analysis of electronic health record data

September 2024

·

163 Reads

·

6 Citations

Nature Medicine

With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here we introduce ehrapy, a modular open-source Python framework designed for exploratory analysis of heterogeneous epidemiology and EHR data. ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models, paving the way for foundational models in biomedical research. We demonstrate ehrapy’s features in six distinct examples. We applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we reveal biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. We reconstructed disease state trajectories in patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we conducted a case study to demonstrate how ehrapy can detect and mitigate biases in EHR data. ehrapy, thus, provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.


Overview of the study
a The medical history captures encounters with primary and secondary care, including diagnoses, medications, and procedures (ideally) from birth. Here we train a multi-layer perceptron on data before recruitment to predict phenome-wide incident disease onset for 1883 endpoints. b Location and size of the 22 assessment centers of the UK Biobank cohort across England, Wales, and Scotland. c To learn risk states from individual medical histories, the UK Biobank population was partitioned by their respective assessment center at recruitment. d For each of the 22 partitions, the Risk Model was trained to predict phenome-wide incident disease onset for 1883 endpoints. Subsequently, for each endpoint, Cox proportional hazard (CPH) models were developed on the risk states in combination with sets of commonly available predictors to model disease risk. Predictions of the CPH model on the test set were aggregated for downstream analysis. e External validation in the All of US cohort. After mapping to the OMOP vocabulary, we transferred the trained risk model to the All of US cohort and calculated the risk state for all endpoints. To validate these risk states, we compared the unchanged CPH models developed in the UK Biobank with refitted CPH models for age and sex. Source data are provided. The Icons are made by Freepik from www.flaticon.com.
Routine health records stratify phenome-wide disease onset
a Ratio of incident events in the Top 10% compared with the Bottom 10% of the estimated risk states. Event rates in the Top 10% are higher than in the Bottom 10% for all but one of the 1883 investigated endpoints. Red dots indicate 24 selected endpoints detailed in Fig. 2b. To illustrate, 1198 (2.39%) individuals in the top risk decile for cardiac arrest experienced an event compared with only 30 (0.06%) in the bottom decile, with a risk ratio of 39.93. b Incident event rates for each medical history risk percentile (if medical history was available) for a selection of 24 endpoints. c Cumulative event rates with 95% confidence intervals for the Top 1%, median, and Bottom 1% of risk percentiles in b) over 15ys. Statistical measures were derived from 502.460 individuals. Individuals with prevalent diseases were excluded from the endpoints-specific analysis. Source data are provided.
Discriminative performance indicates potential utility
a Differences in discriminatory performance quantified by the C-Index between CPH models trained on Age+Sex and Age+Sex+MedicalHistory for all 1883 endpoints. We found significant improvements over the baseline model (Age+Sex, age, and biological sex only) for 1774 (94.2%) of the 1883 investigated endpoints. Red dots indicate selected endpoints in Fig. 3b. b Absolute discriminatory performance in terms of C-Index comparing the baseline (Age+Sex, black point) with the added routine health records risk state (Age+Sex+RiskState, red point) for a selection of 24 endpoints. c The direct C-index differences for the same models. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. d Example of individual predicted phenome-wide risk profile. Predisposition (10-year risk estimated by Age+Sex+RiskState compared to risk estimated by Age+Sex alone) is displayed in the inner circle, and absolute 10-year risk estimated by Age+Sex+RiskState can be found in the outer circle. Labels indicate endpoints with a high individual predisposition (>2 times higher than the Age+Sex-based reference estimate) and absolute 10-year risk > 10%. e Top 5 highest attributed records for selected endpoints. Statistical measures were derived from 502.460 individuals. Source data are provided.
Predictive models can generalize across healthcare systems and populations
a External validation of the differences in discriminatory performance quantified by the C-Index between CPH models trained on age and biological sex and age, biological sex, and the risk state for 1.568 endpoints in the All of Us cohort. We find significant improvements over the baseline model (age and biological sex only) for 1.347 (85.9%) of the 1.568 investigated endpoints. b Direct comparison of the absolute C-Index in the UK Biobank (x-axis) and the All Of Us cohort (y-axis). Significant improvements can be replicated for 1347 (89.8%, green points) of 1500 endpoints in the All Of Us cohort. c Comparison of mean delta C-Index per delta percentile (derived from the UK Biobank from the 1.568 endpoints available in All Of Us). Improvements in the All Of Us cohort are consistent with the UK Biobank cohort: Small improvements in the UK Biobank tend to be larger in All Of Us, while large improvements in the UK Biobank tend to be attenuated in All Of Us. d Distribution of C-Indices for the 1.568 investigated endpoints stratified by communities historically underrepresented in biomedical research (UPD)⁷³. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. e For the same groups, confidence intervals for the additive performance as measured by the C-Index compared to the baseline model. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. f Absolute discriminatory performance in terms of C-Index comparing the baseline (age and biological sex, black point) with the added routine health records risk state (red points) for a selection of 24 endpoints. g The differences in C-index for the same models. Statistical measures for UKB (in b and c))were derived from 502.460 individuals and for AoU (in a–g) were derived from 229.830 individuals. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. Source data are provided.
Predictions can support cardiovascular disease prevention and the response to emerging health threats
a Discriminatory performances in terms of absolute C-Indices comparing risk scores (Age+Sex, SCORE2, ASCVD, and QRISK as indicated, black point) with the risk model based on Age+Sex+RiskState (red segment). b Direct differences between risk scores (Age+Sex, SCORE2, ASCVD, and QRISK as indicated) and the risk model based on Age+Sex+RiskState in terms of C-index. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. c Estimated cumulative event trajectories, including 95% confidence intervals of severe (with hospitalization) and fatal (death registry) COVID-19 outcomes stratified by the Top, Median, and Bottom 5% based on age (left) or risk states of pneumonia, sepsis, and all-cause mortality as estimated by Kaplan-Meier analysis. Statistical measures were derived from 502.460 individuals. Source data are provided.
RETRACTED ARTICLE: Medical history predicts phenome-wide disease onset and enables the rapid response to emerging health threats

May 2024

·

135 Reads

·

3 Citations

The COVID-19 pandemic exposed a global deficiency of systematic, data-driven guidance to identify high-risk individuals. Here, we illustrate the utility of routinely recorded medical history to predict the risk for 1883 diseases across clinical specialties and support the rapid response to emerging health threats such as COVID-19. We developed a neural network to learn from health records of 502,460 UK Biobank. Importantly, we observed discriminative improvements over basic demographic predictors for 1774 (94.3%) endpoints. After transferring the unmodified risk models to the All of US cohort, we replicated these improvements for 1347 (89.8%) of 1500 investigated endpoints, demonstrating generalizability across healthcare systems and historically underrepresented groups. Ultimately, we showed how this approach could have been used to identify individuals vulnerable to severe COVID-19. Our study demonstrates the potential of medical history to support guidance for emerging pandemics by systematically estimating risk for thousands of diseases at once at minimal cost.


A predictive atlas of disease onset from retinal fundus photographs

March 2024

·

133 Reads

Early detection of high-risk individuals is crucial for healthcare systems to cope with changing demographics and an ever-increasing patient population. Images of the retinal fundus are a non- invasive, low-cost examination routinely collected and potentially scalable beyond ophthalmology. Prior work demonstrated the potential of retinal images for risk assessment for common cardiometabolic diseases, but it remains unclear whether this potential extends to a broader range of human diseases. Here, we extended a retinal foundation model (RETFound) to systematically explore the predictive potential of retinal images as a low-cost screening strategy for disease onset across >750 incident diseases in >60,000 individuals. For more than a third (n=308) of the diseases, we demonstrated improved discriminative performance compared to readily available patient characteristics. This included 281 diseases outside of ophthalmology, such as type 2 diabetes (Delta C-Index: UK Biobank +0.073 (0.068, 0.079)) or chronic obstructive pulmonary disease (Delta C-Index: UK Biobank +0.047 (0.039, 0.054)), showcasing the potential of retinal images to complement screening strategies more widely. Moreover, we externally validated these findings in 7,248 individuals from the EPIC-Norfolk Eye Study. Notably, retinal information did not improve the prediction for the onset of cardiovascular diseases compared to established primary prevention scores, demonstrating the need for rigorous benchmarking and disease-agnostic efforts to design cost-efficient screening strategies to improve population health. We demonstrated that predictive improvements were attributable to retinal vascularisation patterns and less obvious features, such as eye colour or lens morphology, by extracting image attributions from risk models and performing genome-wide association studies, respectively. Genetic findings further highlighted commonalities between eye-derived risk estimates and complex disorders, including novel loci, such as IMAP1, for iron homeostasis. In conclusion, we present the first comprehensive evaluation of predictive information derived from retinal fundus photographs, illustrating the potential and limitations of easily accessible and low-cost retinal images for risk assessment across common and rare diseases.


Exploratory electronic health record analysis with ehrapy

December 2023

·

97 Reads

With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here, we introduce ehrapy, a modular open-source Python framework designed for exploratory end-to-end analysis of heterogeneous epidemiology and electronic health record data. Ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference, and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models paving the way for foundational models in biomedical research. We demonstrated ehrapys features in five distinct examples: We first applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we revealed biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. Finally, we reconstructed disease state trajectories in SARS-CoV-2 patients based on imaging data. Ehrapy thus provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.


Eos and OMOCL: Towards a seamless integration of openEHR records into the OMOP Common Data Model

July 2023

·

135 Reads

·

7 Citations

Journal of Biomedical Informatics

Background: The reuse of data from electronic health records (EHRs) for research purposes promises to improve the data foundation for clinical trials and may even support to enable them. Nevertheless, EHRs are characterized by both, heterogeneous structure and semantics. To standardize this data for research, the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standard has recently seen an increase in use. However, the conversion of these EHRs into the OMOP CDM requires complex and resource intensive Extract Transform and Load (ETL) processes. This hampers the reuse of clinical data for research. To solve the issues of heterogeneity of EHRs and the lack of semantic precision on the care site, the openEHR standard has recently seen wider adoption. A standardized process to integrate openEHR records into the CDM potentially lowers the barriers of making EHRs accessible for research. Yet, a comprehensive approach about the integration of openEHR records into the OMOP CDM has not yet been made. Methods: We analysed both standards and compared their models to identify possible mappings. Based on this, we defined the necessary processes to transform openEHR records into CDM tables. We also discuss the limitation of openEHR with its unspecific demographics model and propose two possible solutions. Results: We developed the OMOP Conversion Language (OMOCL) which enabled us to define a declarative openEHR archetype-to-CDM mapping language. Using OMOCL, it was possible to define a set of mappings. As a proof-of-concept, we implemented the Eos tool, which uses the OMOCL-files to successfully automatize the ETL from real-world and sample EHRs into the OMOP CDM. Discussion: Both Eos and OMOCL provide a way to define generic mappings for an integration of openEHR records into OMOP. Thus, it represents a significant step towards achieving interoperability between the clinical and the research data domains. However, the transformation of openEHR data into the less expressive OMOP CDM leads to a loss of semantics.


Machine Learning for Medical Data Integration

May 2023

·

210 Reads

·

6 Citations

Making health data available for secondary use enables innovative data-driven medical research. Since modern machine learning (ML) methods and precision medicine require extensive amounts of data covering most of the standard and edge cases, it is essential to initially acquire large datasets. This can typically only be achieved by integrating different datasets from various sources and sharing data across sites. To obtain a unified dataset from heterogeneous sources, standard representations and Common Data Models (CDM) are needed. The process of mapping data into these standardized representations is usually very tedious and requires many manual configuration and refinement steps. A potential way to reduce these efforts is to use ML methods not only for data analysis, but also for the integration of health data on the syntactic, structural, and semantic level. However, research on ML-based medical data integration is still in its infancy. In this article, we describe the current state of the literature and present selected methods that appear to have a particularly high potential to improve medical data integration. Moreover, we discuss open issues and possible future research directions.


Software-Tool Support for Collaborative, Virtual, Multi-Site Molecular Tumor Boards

April 2023

·

134 Reads

·

6 Citations

SN Computer Science

The availability of high-throughput molecular diagnostics builds the foundation for Molecular Tumor Boards (MTBs). Although more fine-grained data is expected to support decision making of oncologists, assessment of data is complex and time-consuming slowing down the implementation of MTBs, e.g., due to retrieval of the latest medical publications, assessment of clinical evidence, or linkage to the latest clinical guidelines. We share our findings from analysis of existing tumor board processes and defininion of clinical processes for the adoption of MTBs. Building on our findings, we have developed a real-world software prototype together with oncologists and medical professionals, which supports the preparation and conduct of MTBs and enables collaboration between medical experts by sharing medical knowledge even across the hospital locations. We worked in interdisciplinary teams of clinicians, oncologists, medical experts, medical informaticians, and software engineers using design thinking methodology. With their input, we identified challenges and limitations of the current MTB approaches, derived clinical process models using Business Process and Modeling Notation (BMPN), and defined personas, functional and non-functional requirements for software tool support. Based on it, we developed software prototypes and evaluated them with clinical experts from major university hospitals across Germany. We extended the Kanban methodology enabling holistic tracking of patient cases from “backlog” to “follow-up” in our app. The feedback from interviewed medical professionals showed that our clinical process models and software prototype provide suitable process support for the preparation and conduction of molecular tumor boards. The combination of oncology knowledge across hospitals and the documentation of treatment decision can be used to form a unique medical knowledge base by oncologists for oncologists. Due to the high heterogeneity of tumor diseases and the spread of the latest medical knowledge, a cooperative decision-making process including insights from similar patient cases was considered as a very valuable feature. The ability to transform prepared case data into a screen presentation was recognized as an essential feature speeding up the preparation process. Oncologists require special software tool support to incorporate and assess molecular data for the decision-making process. In particular, the need for linkage to the latest medical knowledge, clinical evidence, and collaborative tools to discuss individual cases were named to be of importance. With the experiences from the COVID-19 pandemic, the acceptance of online tools and collaborative working is expected to grow. Our virtual multi-site approach proved to allow a collaborative decision-making process for the first time, which we consider to have a positive impact on the overall treatment quality.


Citations (8)


... This flexibility is particularly valuable given the persistent challenges in EHR data interoperability, stemming from variable coding practices, privacy constraints, and regulatory limitations, which often hinder the aggregation of large, standardized datasets for model pretraining [35]. Moreover, since LLMs are pre-trained on broad and diverse text corpora, including medical literature and case reports, they are well-equipped to capture the semantics of rare or underrepresented clinical concepts [36]. This capacity mitigates the limitations of count-based models that typically filter out infrequent events, ensuring that even rare but clinically significant phenomena are effectively encoded [37]. ...

Reference:

Large Language Models are Powerful EHR Encoders
Medical history predicts phenome-wide disease onset and enables the rapid response to emerging health threats

... Moreover, the dynamic nature of resistance development necessitates datasets that capture temporal trends and patient-specific factors influencing AMR. Real-world data from electronic health records (EHR) offer a valuable opportunity to address this gap by providing granular information on microbial cultures, patient characteristics, and treatment outcomes [6], [7], [8], [9]. Yet, creating meaningful and reliable datasets from EHR data presents several challenges, including heterogeneity in data representation, the need for rigorous de-identification, and ensuring data quality and interpretability. ...

An open-source framework for end-to-end analysis of electronic health record data

Nature Medicine

... Moreover, the integration of AI with electronic health records (EHRs) provides a comprehensive overview of patient health histories, enhancing diagnostic accuracy and treatment efficacy [8]. AI-driven analysis of EHRs can identify hidden patterns across patient populations, predict disease progression, and suggest preventative measures tailored to individual health profiles [9]. This holistic approach not only improves individual patient outcomes but also contributes to broader public health strategies. ...

RETRACTED ARTICLE: Medical history predicts phenome-wide disease onset and enables the rapid response to emerging health threats

... However, its flexibility can at times lead to variations in data representation that impede semantic interoperability and data consistency between different sources [14]. OMOP CDM is designed for observational healthcare data, providing a standardized approach to organizing and querying clinical data, especially for research and analysis purposes [15]. Additionally, it is compatible with HL7 FHIR, allowing for the creation of a cohesive data ecosystem that covers both research-oriented and clinical use case scenarios [16]. ...

Eos and OMOCL: Towards a seamless integration of openEHR records into the OMOP Common Data Model

Journal of Biomedical Informatics

... We train all EHR models on UK Biobank data using 4-fold cross-validation, using a similar evaluation strategy to Steinfeldt et al. 27 . We created four train/test splits with disjoint test sets where approximately 75% of the data are training and 25% of the data are test sets; we split folds along assessment centers to reduce bias and data leakage due to regional stratification and assessment personnel & equipment. ...

Medical history predicts phenome-wide disease onset

... For example, epigenetic aging biomarkers that are based on DNA methylation values at specific CpG sites have only weak correlations with metabolomics-based aging biomarkers, correlations ranging from −0.22 to 0.21 for MetaboAge and from 0 to 0.32 for MetaboHealth (Kuiper et al. 2023). The advantage of metabolomics-based aging biomarkers compared to other omics-based biomarkers lies in the fact that the metabolome carries more systemic information from multiple tissues across the body than, e.g., the methylome or transcriptome (Buergel et al. 2022). In addition, metabolomics-based biomarkers are trained on large sample sizes (Rutledge et al. 2022), and in recent years, nuclear magnetic resonance (NMR)-based metabolomics have matured and are now available at lower cost (Wishart et al. 2022). ...

Metabolomic profiles predict individual multidisease outcomes

Nature Medicine