Michael G Kahn’s research while affiliated with University of Colorado and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (128)


Linkability measures to assess the data characteristics for record linkage
  • Article

September 2024

·

2 Reads

Journal of the American Medical Informatics Association

·

Andrew Hill

·

Michael G Kahn

·

[...]

·

Objectives Accurate record linkage (RL) enables consolidation and de-duplication of data from disparate datasets, resulting in more comprehensive and complete patient data. However, conducting RL with low quality or unfit data can waste institutional resources on poor linkage results. We aim to evaluate data linkability to enhance the effectiveness of record linkage. Materials and Methods We describe a systematic approach using data fitness (“linkability”) measures, defined as metrics that characterize the availability, discriminatory power, and distribution of potential variables for RL. We used the isolation forest algorithm to detect abnormal linkability values from 188 sites in Indiana and Colorado, and manually reviewed the data to understand the cause of anomalies. Result We calculated 10 linkability metrics for 11 potential linkage variables (LVs) across 188 sites for a total of 20 680 linkability metrics. Potential LVs such as first name, last name, date of birth, and sex have low missing data rates, while Social Security Number vary widely in completeness among all sites. We investigated anomalous linkability values to identify the cause of many records having identical values in certain LVs, issues with placeholder values disguising data missingness, and orphan records. Discussion The fitness of a variable for RL is determined by its availability and its discriminatory power to uniquely identify individuals. These results highlight the need for awareness of placeholder values, which inform the selection of variables and methods to optimize RL performance. Conclusion Evaluating linkability measures using the isolation forest algorithm to highlight anomalous findings can help identify fitness-for-use issues that must be addressed before initiating the RL process to ensure high-quality linkage outcomes.


Figure 1. Level 1 logical data flow diagram. Yellow components perform the OMOP-to-FHIR transformation and FHIR server upload. Green components perform the Bulk FHIR export and FHIR-based ETL into the MENDS database.
Figure 2. Structure of "OMOP JSON" extracted from OMOP CDM V5.3 queries using synthetic data based on the OMOP Condition_Occurrence table. The person_id, provider_id, and condition_start/end date fields do not refer to actual values.
Figure 3. Whistle transformation specification for creating FHIR R4 Person resource from OMOP CDM V5.3 Patient record. Functions such as USCore_Birthsex() use a unique feature of the Whistle transformation language that calls FHIR concept maps to convert OMOP-specific concept_ids into US Core IG-compliant CodeableConcepts.
Figure 4. FHIR ConceptMap maps OMOP-specific concept_ids for patient sex into FHIR US Core compliant values.
Figure 5. Multiple JSON Codings in a FHIR code element enable inclusion of both local source and FHIR-required values. In this example, both FHIRrequired RxNorm (red box) and local NDC source codes (green box) are included in 2 Coding objects in the Medication.code JSON object.

+1

MENDS-on-FHIR: leveraging the OMOP common data model and FHIR standards for national chronic disease surveillance
  • Article
  • Full-text available

May 2024

·

163 Reads

JAMIA Open

Objectives The Multi-State EHR-Based Network for Disease Surveillance (MENDS) is a population-based chronic disease surveillance distributed data network that uses institution-specific extraction-transformation-load (ETL) routines. MENDS-on-FHIR examined using Health Language Seven’s Fast Healthcare Interoperability Resources (HL7® FHIR®) and US Core Implementation Guide (US Core IG) compliant resources derived from the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to create a standards-based ETL pipeline. Materials and Methods The input data source was a research data warehouse containing clinical and administrative data in OMOP CDM Version 5.3 format. OMOP-to-FHIR transformations, using a unique JavaScript Object Notation (JSON)-to-JSON transformation language called Whistle, created FHIR R4 V4.0.1/US Core IG V4.0.0 conformant resources that were stored in a local FHIR server. A REST-based Bulk FHIR $export request extracted FHIR resources to populate a local MENDS database. Results Eleven OMOP tables were used to create 10 FHIR/US Core compliant resource types. A total of 1.13 trillion resources were extracted and inserted into the MENDS repository. A very low rate of non-compliant resources was observed. Discussion OMOP-to-FHIR transformation results passed validation with less than a 1% non-compliance rate. These standards-compliant FHIR resources provided standardized data elements required by the MENDS surveillance use case. The Bulk FHIR application programming interface (API) enabled population-level data exchange using interoperable FHIR resources. The OMOP-to-FHIR transformation pipeline creates a FHIR interface for accessing OMOP data. Conclusion MENDS-on-FHIR successfully replaced custom ETL with standards-based interoperable FHIR resources using Bulk FHIR. The OMOP-to-FHIR transformations provide an alternative mechanism for sharing OMOP data.

Download

An open source knowledge graph ecosystem for the life sciences

April 2024

·

323 Reads

·

15 Citations

Scientific Data

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.



Figure 1. Cumulative enrollment and sample collection of CCPM Biobank participants over time Enrollment shown in blue and sample collection in red.
Figure 5. Replication of known associations in the CCPM Biobank across a range of traits, comparing CCPM Biobank findings with REGENIE to those found in the GWAS catalog. Error bars represent the 95% confidence interval for the odds ratio For all, the risk-increasing allele is compared, and with multiple reporters, the largest dataset in the GWAS catalog was used for reference. OMIM identifiers for nearest genes are as follows: APOE (MIM: 107741), SMAD3 (MIM: 603109), TERT (MIM: 187270), HCG22 (MIM: 613918), PTCSC2 (MIM: N/A), HLA-DRA (MIM: 142860), FTO (MIM: 610966), HCP5 (MIM: 604676), HLA-DRB1 (MIM: 142857), HLA-DQB1 (MIM: 604305), and TCF7L2 (MIM: 602228).
Building a vertically integrated genomic learning health system: The biobank at the Colorado Center for Personalized Medicine

January 2024

·

106 Reads

·

18 Citations

The American Journal of Human Genetics

Precision medicine initiatives across the globe have led to a revolution of repositories linking large-scale genomic data with electronic health records, enabling genomic analyses across the entire phenome. Many of these initiatives focus solely on research insights, leading to limited direct benefit to patients. We describe the biobank at the Colorado Center for Personalized Medicine (CCPM Biobank) that was jointly developed by the University of Colorado Anschutz Medical Campus and UCHealth to serve as a unique, dual-purpose research and clinical resource accelerating personalized medicine. This living resource currently has more than 200,000 participants with ongoing recruitment. We highlight the clinical, laboratory, regulatory, and HIPAA-compliant informatics infrastructure along with our stakeholder engagement, consent, recontact, and participant engagement strategies. We characterize aspects of genetic and geographic diversity unique to the Rocky Mountain region, the primary catchment area for CCPM Biobank participants. We leverage linked health and demographic information of the CCPM Biobank participant population to demonstrate the utility of the CCPM Biobank to replicate complex trait associations in the first 33,674 genotyped individuals across multiple disease domains. Finally, we describe our current efforts toward return of clinical genetic test results, including high-impact pathogenic variants and pharmacogenetic information, and our broader goals as the CCPM Biobank continues to grow. Bringing clinical and research interests together fosters unique clinical and translational questions that can be addressed from the large EHR-linked CCPM Biobank resource within a HIPAA- and CLIA-certified environment.


Sustained Effect of Clinical Decision Support for Heart Failure: A Natural Experiment Using Implementation Science

October 2023

·

15 Reads

·

3 Citations

Applied Clinical Informatics

Objectives In a randomized controlled trial, we found that applying implementation science (IS) methods and best practices in clinical decision support (CDS) design to create a locally customized, “enhanced” CDS significantly improved evidence-based prescribing of β blockers (BB) for heart failure compared with an unmodified commercially available CDS. At trial conclusion, the enhanced CDS was expanded to all sites. The purpose of this study was to evaluate the real-world sustained effect of the enhanced CDS compared with the commercial CDS. Methods In this natural experiment of 28 primary care clinics, we compared clinics exposed to the commercial CDS (preperiod) to clinics exposed to the enhanced CDS (both periods). The primary effectiveness outcome was the proportion of alerts resulting in a BB prescription. Secondary outcomes included patient reach and clinician adoption (dismissals). Results There were 367 alerts for 183 unique patients and 171 unique clinicians (pre: March 2019–August 2019; post: October 2019–March 2020). The enhanced CDS increased prescribing by 26.1% compared with the commercial (95% confidence interval [CI]: 17.0–35.1%), which is consistent with the 24% increase in the previous study. The odds of adopting the enhanced CDS was 81% compared with 29% with the commercial (odds ratio: 4.17, 95% CI: 1.96–8.85). The enhanced CDS adoption and effectiveness rates were 62 and 14% in the preperiod and 92 and 10% in the postperiod. Conclusion Applying IS methods with CDS best practices was associated with improved and sustained clinician adoption and effectiveness compared with a commercially available CDS tool.


Figure 2. Structure of "OMOP JSON" extracted from OMOP CDM V5.3 queries using synthetic data based on the OMOP Condition_Occurrence table. The person_id, provider_id, condition_ start/end dates fields do not refer to actual values.
Figure 3. Whistle transformation specification for creating FHIR R4 Person resource from OMOP CDM V5.3 Patient record. Functions such as USCore_Birthsex() use a unique feature of the Whistle transformation language that calls FHIR concept maps to convert OMOP-specific concept_ids into US Core IG-compliant CodeableConcepts. One unique feature of the Whistle mapping language is a built-in function focused on code harmonization using local FHIR ConceptMap resources or remote FHIR terminology services. For example, the Whistle function USCore_Birthsex() in Figure 3
Figure 4. FHIR ConceptMap maps OMOP-specific concept_ids for patient sex into FHIR US Core compliant values.
Figure 5. Multiple JSON Codings in a FHIR code element enable inclusion of both local source and FHIR-required values. In this example, both FHIRrequired RxNorm (red box) and local NDC source codes (green box) are included in two Coding objects in the Medication.code JSON object.
MENDS-on-FHIR: Leveraging the OMOP common data model and FHIR standards for national chronic disease surveillance

August 2023

·

1,688 Reads

Objective The Multi-State EHR-Based Network for Disease Surveillance (MENDS) is a population-based chronic disease surveillance distributed data network. Current data partners create institution-specific extraction-transformation-load (ETL) routines. MENDS-on-FHIR provides a standards-based ETL approach using Health Language Seven’s Fast Healthcare Interoperability Resources (HL7 ® FHIR ® ) and US Core Implementation Guide (US Core IG) compliant resources derived from the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). Materials and Methods The input data source was a research data warehouse (RDW) containing clinical and administrative data in OMOP CDM Version 5.3 format. OMOP-to-FHIR transformations, using a unique JavaScript Object Notation (JSON)-to-JSON language called Whistle, created FHIR R4.0.1/US Core IG V4.0.0 conformant resources that were stored in a local FHIR server. A REST-based Bulk FHIR $export request extracted FHIR resources to populate a local MENDS database. Results Eleven OMOP tables were used to create 10 FHIR/US Core compliant resource types. A total of 1.13 trillion resources were extracted and inserted into the MENDS repository. A very low rate of non-compliant resources was observed. Discussion OMOP-to-FHIR transformations passed validation with only minimal non-compliance issues. These resources provided the clinical and administrative data elements required by the MENDS surveillance use case. The Bulk FHIR application programming interface (API) enabled population-level data exchange using interoperable FHIR resources. The OMOP-to-FHIR transformation pipeline creates a FHIR “facade” for accessing OMOP data. Conclusion MENDS-on-FHIR successfully replaced custom ETL with standards-based interoperable FHIR resources using Bulk FHIR. The OMOP-on-FHIR transformations provide an alternative mechanism for sharing OMOP data.


An Open-Source Knowledge Graph Ecosystem for the Life Sciences

July 2023

·

613 Reads

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to automatically construct them. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoints and abstraction algorithms), and benchmarks (e.g., prebuilt KGs and embeddings). We evaluate the ecosystem by surveying open-source KG construction methods and analyzing its computational performance when constructing 12 large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.



Data Quality in Clinical Research

June 2023

·

52 Reads

·

6 Citations

Every scientist knows that research results are only as good as the data upon which the conclusions were formed. However, most scientists receive no training in methods for achieving, assessing, or controlling the quality of research data—topics central to clinical research informatics. This chapter covers the basics of acquiring or collecting and processing data for research given the available data sources, systems, and people. Data quality dimensions specific to the clinical research context are used, and a framework for data quality practice and planning is presented. Available research is summarized, providing estimates of data quality capability for common clinical research data collection and processing methods. This chapter provides researchers, informaticists, and clinical research data managers basic tools to assure, assess, and control the quality of data for research.KeywordsClinical research dataData qualityResearch data collectionProcessing methodsInformaticsManagement of clinical dataData accuracySecondary use


Citations (66)


... In Open Biological and Biomedical Ontology Foundry (OBO Foundry) [51] there are well over 100 ontologies currently maintained and active. There are many ongoing efforts to combine all knowledge relating to human health into a single large knowledge graph such as PheKnowlater [52] and the Integrated Monarch Ontology [53]. A clear future direction of this work is to include, learn from, and use a more full representation of human biology. ...

Reference:

Hypothesis generation for rare and undiagnosed diseases through clustering and classifying time-versioned biological ontologies
An open source knowledge graph ecosystem for the life sciences

Scientific Data

... These thyroid cancer-associated syndromes are thought to be rare, with the population prevalence ranging from 1:7000 for FAP 6 to 1:600,000-4,000,000 for MEN3 (formerly known as MEN2B). 10 Due to the growth of large biobanks such as the All of Us Research Program (AoU) 11 and Colorado Center for Personalized Medicine (CCPM) 12 , more people in the United States get their genomes sequenced irrespectively of the clinical indication. This leads to the incidental discovery of actionable variants in the reportable genes on the American College of Medical Genetics and Genomics (ACMG) list. ...

Building a vertically integrated genomic learning health system: The biobank at the Colorado Center for Personalized Medicine

The American Journal of Human Genetics

... Based on the evaluation, the customized CDS tool was found to be more effective, and clinicians expressed a preference for its continued use. As a result, all clinics have since transitioned to the customized CDS tool, and its effectiveness has been sustained [32]. It was also determined that the CDS tool needs to include additional evidence-based medications for heart failure and should be expanded to cardiology clinics. ...

Sustained Effect of Clinical Decision Support for Heart Failure: A Natural Experiment Using Implementation Science
  • Citing Article
  • October 2023

Applied Clinical Informatics

... For example, tools like PheRS, which measures the similarity between an individual's diagnosis codes and phenotypic features of known genetic disorders, also require mappings that link ICD codes to HPO terms, which most EHRs do not contain. 20 In such scenarios, unless the researcher is an expert in ontology, they will most likely turn to resources like the UMLS tables for these mappings; hence, determining the coverage of ICD10-CM to HPO mappings with the UMLS table is imperative. ...

Ontologizing health systems data at scale: making translational discovery a reality

npj Digital Medicine

... Federated analytics and federated learning are supported through the establishment of a federated network, a series of decentralised, interconnected nodes where the various data is stored [12]. Federated networks help address the challenges of data silos and fragmentation, by enabling researchers to analyse data from multiple sources without the need to transfer or centralise data. ...

Contextualising adverse events of special interest to characterise the baseline incidence rates in 24 million patients with COVID-19 across 26 databases: a multinational retrospective cohort study

EClinicalMedicine

... and LOINC2HPO. 19,20 The OMOP2OBO algorithm was developed to generate mappings between clinical vocabularies in the OMOP common data model and eight Open Biomedical Foundry ontologies 21 spanning diseases, phenotypes, anatomical entities, organisms, chemicals, vaccines, and proteins. Using this algorithm, a large-scale set of mappings was developed, which includes 92,367 conditions, 8615 drug ingredients, and 10,673 measurement results. ...

Ontologizing Health Systems Data at Scale: Making Translational Discovery a Reality

... Therefore, EHR can be improved and optimized through information visualization techniques, such as adopting appropriate layouts, visual encodings and interactive tools, to enhance further the decision-making effectiveness of medical staff. Current visualization research has two main parts; the first part uses machine learning or natural language processing [15][16][17][18][19][20][21][22], and the second part is mainly based on events [23][24][25][26][27][28][29][30]; these techniques enable these data to be transformed into valuable medical information [31]. ...

ReviewR: a light-weight and extensible tool for manual review of clinical records

JAMIA Open

... The promise of clinical applications of PGS across diseases is that they provide a possible biomarker that can be used to guide clinical practice. Specific to diLQTS, such information could be easily integrated into clinical decision-support tools, particularly in the emerging era of biobanks such as available at the University of Colorado [32,33], incorporating genetic information that could be obtained prior to other clinical diagnostic information, such as the QT interval itself on an ECG [26,30]. However, to understand how this information might be used clinically, more work is needed to examine the predictive accuracy itself, as well as how the information might be integrated into care decisions. ...

Building a Vertically-Integrated Genomic Learning Health System: The Colorado Center for Personalized Medicine Biobank
  • Citing Preprint
  • June 2022

... Researchers collected 924 features from 597 patients diagnosed with LC and trained three ML models to predict LC and discriminate LC from COVID-19 patients. Their results showed that the gradient boost (XGBoost) algorithm achieved an AUC value of 0.92 for all patients, 0.90 for hospitalized patients, and 0.85 for outpatients; the results from their model are presented in Figure 4. 96 Also, Patel et al. (2023) studied the expression of 2925 unique blood proteins in LC outpatients compared to that in COVID-19 patients and healthy controls. ML analysis identified 119 relevant proteins for discriminating LC outpatients, with nine and five protein combinations with high sensitivity and specificity for LC (AUC = 1.00, ...

Identifying who has long COVID in the USA: a machine learning approach using N3C data

The Lancet Digital Health