Analyzing the heterogeneity and complexity of Electronic Health Record oriented phenotyping algorithms

Mayo Clinic, Rochester, MN, USA.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2011; 2011:274-83.
Source: PubMed


The need for formal representations of eligibility criteria for clinical trials - and for phenotyping more generally - has been recognized for some time. Indeed, the availability of a formal computable representation that adequately reflects the types of data and logic evidenced in trial designs is a prerequisite for the automatic identification of study-eligible patients from Electronic Health Records. As part of the wider process of representation development, this paper reports on an analysis of fourteen Electronic Health Record oriented phenotyping algorithms (developed as part of the eMERGE project) in terms of their constituent data elements, types of logic used and temporal characteristics. We discovered that the majority of eMERGE algorithms analyzed include complex, nested boolean logic and negation, with several dependent on cardinality constraints and complex temporal logic. Insights gained from the study will be used to augment the CDISC Protocol Representation Model.

Download full-text


Available from: Joshua C Denny, Oct 13, 2015
19 Reads
  • Source
    • "4 Phenotyping employs categorizing billing codes, classifying numerical test results, computing frequency, sequential and other temporal patterns, and leveraging alternative data types depending on availability. 5 Phenotyping requires a comprehensive EHR with broad coverage of clinical observations and events, a requirement that is increasingly satisfied by current EHR deployments. 6 "
    [Show abstract] [Hide abstract]
    ABSTRACT: Clinical phenotyping is an emerging research information systems capability. Research uses of electronic health record (EHR) data may require the ability to identify clinical co-morbidities and complications. Such phenotypes may not be represented directly as discrete data elements, but rather as frequency, sequential and temporal patterns in billing and clinical data. These patterns' complexity suggests the need for a robust yet flexible extract, transform and load (ETL) process that can compute them. This capability should be accessible to investigators with limited ability to engage an IT department in data management. We have developed such a system, Eureka! Clinical Analytics. It extracts data from an Excel spreadsheet, computes a broad set of phenotypes of common interest, and loads both raw and computed data into an i2b2 project. A web-based user interface allows executing and monitoring ETL processes. Eureka! is deployed at our institution and is available for deployment in the cloud.
    03/2013; 2013:203-207.
  • Source
    • "As we move to large-scale mining of the EHR, defining the queries has become a bottleneck. Efforts like eMERGE21 are showing significant progress in generating and sharing queries across institutions,22 23 but local variations remain, and defining even a small number of phenotypes can take a group of institutions years. Despite advances in ontologies and language processing, the process remains largely unchanged since the earliest days,24 using detective work and alchemy to get golden phenotypes from base data. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The national adoption of electronic health records (EHR) promises to make an unprecedented amount of data available for clinical research, but the data are complex, inaccurate, and frequently missing, and the record reflects complex processes aside from the patient's physiological state. We believe that the path forward requires studying the EHR as an object of interest in itself, and that new models, learning from data, and collaboration will lead to efficient use of the valuable information currently locked in health records.
    Journal of the American Medical Informatics Association 09/2012; 20(1). DOI:10.1136/amiajnl-2012-001145 · 3.50 Impact Factor
  • Source
    • "This step is particularly salient in the context of eMERGE phase 2, which involves the collaboration of seven institutions and the development of 21 new phenotyping algorithms. The need to adopt a standard formal representation in the context of the eMERGE project was discussed previously in Conway et al. (2011), 3 which this paper builds upon. In order to determine which features a formal representational format would need to support, the Conway et al. paper analyzed 14 eMERGE algorithms in terms of their structure, the types of data elements they used, and the types of logic employed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The development of Electronic Health Record (EHR)-based phenotype selection algorithms is a non-trivial and highly iterative process involving domain experts and informaticians. To make it easier to port algorithms across institutions, it is desirable to represent them using an unambiguous formal specification language. For this purpose we evaluated the recently developed National Quality Forum (NQF) information model designed for EHR-based quality measures: the Quality Data Model (QDM). We selected 9 phenotyping algorithms that had been previously developed as part of the eMERGE consortium and translated them into QDM format. Our study concluded that the QDM contains several core elements that make it a promising format for EHR-driven phenotyping algorithms for clinical research. However, we also found areas in which the QDM could be usefully extended, such as representing information extracted from clinical text, and the ability to handle algorithms that do not consist of Boolean combinations of criteria.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2012; 2012:911-20.
Show more