
Elmer Bernstam- The University of Texas Health Science Center at Houston
Elmer Bernstam
- The University of Texas Health Science Center at Houston
About
195
Publications
24,441
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,922
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (195)
Electronic health record (EHR) data are a rich and invaluable source of real-world clinical information, enabling detailed insights into patient populations, treatment outcomes, and healthcare practices. The availability of large volumes of EHR data are critical for advancing translational research and developing innovative technologies such as art...
Background
Scalable identification of patients with post-acute sequelae of COVID-19 (PASC) is challenging due to a lack of reproducible precision phenotyping algorithms, which has led to suboptimal accuracy, demographic biases, and underestimation of the PASC.
Methods
In a retrospective case-control study, we developed a precision phenotyping algo...
Objective:
Duplicate patient records can increase cost and medical errors. We assessed the association between demographic factors, comorbidities, healthcare usage and duplicate electronic health records.
Materials and methods:
We analyzed the association between duplicate patient records and multiple demographic variables (race, Hispanic ethnic...
Objectives
Healthcare organizations, including Clinical and Translational Science Awards (CTSA) hubs funded by the National Institutes of Health, seek to enable secondary use of electronic health record (EHR) data through an enterprise data warehouse for research (EDW4R), but optimal approaches are unknown. In this qualitative study, our goal was t...
Scalable identification of patients with the post-acute sequelae of COVID-19 (PASC) is challenging due to a lack of reproducible precision phenotyping algorithms and the suboptimal accuracy, demographic biases, and underestimation of the PASC diagnosis code (ICD-10 U09.9). In a retrospective case-control study, we developed a precision phenotyping...
Introduction
The focus on social determinants of health (SDOH) and their impact on health outcomes is evident in U.S. federal actions by Centers for Medicare & Medicaid Services and Office of National Coordinator for Health Information Technology. The disproportionate impact of COVID-19 on minorities and communities of color heightened awareness of...
Multiple Sclerosis (MS) is a chronic disease developed in the human brain and spinal cord, which can cause permanent damage or deterioration of the nerves. The severity of MS disease is monitored by the Expanded Disability Status Scale, composed of several functional sub-scores. Early and accurate classification of MS disease severity is critical f...
Variation in availability, format, and standardization of patient attributes across health care organizations impacts patient-matching performance. We report on the changing nature of patient-matching features available from 2010–2020 across diverse care settings. We asked 38 health care provider organizations about their current patient attribute...
Objective:
Medication discrepancies between clinical systems may pose a patient safety hazard. In this paper, we identify challenges and quantify medication discrepancies across transitions of care.
Materials and methods:
We used structured clinical data and free-text hospital discharge summaries to compare active medications lists at four time...
Background
We propose a new deep learning model to identify unnecessary hemoglobin (Hgb) tests for patients admitted to the hospital, which can help reduce health risks and healthcare costs.
Methods
We collected internal patient data from a teaching hospital in Houston and external patient data from the MIMIC III database. The study used a conserv...
Multiple Sclerosis (MS) is a chronic disease developed in human brain and spinal cord, which can cause permanent damage or deterioration of the nerves. The severity of MS disease is monitored by the Expanded Disability Status Scale (EDSS), composed of several functional sub-scores. Early and accurate classification of MS disease severity is critica...
Building on previous work to define the scientific discipline of biomedical informatics, we present a framework that categorizes fundamental challenges into groups based on data, information, and knowledge, along with the transitions between these levels. We define each level and argue that the framework provides a basis for separating informatics...
Unfractionated heparin (UFH) and low molecular weight heparin (LMWH) are often administered to prevent venous thromboembolism (VTE) in critically ill patients. However, the preferred prophylactic agent (UFH or LMWH) is not known. We compared the all-cause mortality rate in patients receiving UFH to LMWH for VTE prophylaxis. We conducted a retrospec...
Objective:
SNOMED CT is the largest clinical terminology worldwide. Quality assurance of SNOMED CT is of utmost importance to ensure that it provides accurate domain knowledge to various SNOMED CT-based applications. In this work, we introduce a deep learning-based approach to uncover missing is-a relations in SNOMED CT.
Materials and methods:
O...
Objective:
To evaluate tokens commonly used by clinical research consortia to aggregate clinical data across institutions.
Materials and methods:
This study compares tokens alone and token-based matching algorithms against manual annotation for 20,002 record pairs extracted from University of Texas Houston (UTH)'s clinical data warehouse in term...
Background
Diabetes and depression affect a significant percentage of the world’s total population, and the management of these conditions is critical for reducing the global burden of disease. Medication adherence is crucial for improving diabetes and depression outcomes, and research is needed to elucidate barriers to medication adherence, includ...
Unnecessary laboratory tests present health risks and increase healthcare costs. We propose a new deep learning model to identify unnecessary hemoglobin (Hgb) tests for patients admitted to the hospital. Machine learning models might generate less reliable results due to noisy inputs containing low-quality information. We estimate prediction confid...
Although pharmaceutical products undergo clinical trials to profile efficacy and safety, some adverse drug reactions (ADRs) are only discovered after release to market. Post-market drug safety surveillance - pharmacovigilance - leverages information from various sources to proactively identify such ADRs. Clinical notes are one source of observation...
Providers currently rely on universal screening to identify health-related social needs (HRSNs). Predicting HRSNs using EHR and community-level data could be more efficient and less resource intensive. Using machine learning models, we evaluated the predictive performance of HRSN status from EHR and community-level social determinants of health (SD...
Objective:
Among National Institutes of Health Clinical and Translational Science Award (CTSA) hubs, effective approaches for enterprise data warehouses for research (EDW4R) development, maintenance, and sustainability remain unclear. The goal of this qualitative study was to understand CTSA EDW4R operations within the broader contexts of academic...
Objectives
Scanned documents (SDs), while common in electronic health records and potentially rich in clinically relevant information, rarely fit well with clinician workflow. Here, we identify scanned imaging reports requiring follow-up with high recall and practically useful precision.
Materials and methods
We focused on identifying imaging find...
PURPOSE
The Medicare Access and CHIP Reauthorization Act of 2015 (MACRA) requires eligible clinicians to report clinical quality measures (CQMs) in the Merit-Based Incentive Payment System (MIPS) to maximize reimbursement. To determine whether structured data in electronic health records (EHRs) were adequate to report MIPS CQMs, EHR data aggregated...
Background
Unnecessary labs contribute to iatrogenic harm and are a major source of waste in the healthcare system. We previously developed a machine learning algorithm to help clinicians identify unnecessary laboratory tests, but it has not been externally validated. In this study, we externally validate our ML algorithm.
Methods
To externally va...
Objectives
Electronic health records (EHRs) contain a large quantity of machine-readable data. However, institutions choose different EHR vendors, and the same product may be implemented differently at different sites. Our goal was to quantify the interoperability of real-world EHR implementations with respect to clinically relevant structured data...
Introduction
In the context of competency-based medical education, poor student performance must be accurately documented to allow learners to improve and to protect the public. However, faculty may be reluctant to provide evaluations that could be perceived as negative, and clerkship directors report that some students pass who should have failed....
Background: Diabetes and depression affect a significant percentage of total world’s population, and the management of these conditions is critical for reducing the global burden of disease. Medication adherence is critical for improving diabetes and depression outcomes, and research is needed to elucidate barriers to medication adherence, includin...
Artificial intelligence (AI) is transforming many domains including finance, agriculture, defense and biomedicine. In this paper, we focus on the role of AI in clinical and translational research (CTR) including pre‐clinical research (T1), clinical research (T2), clinical implementation (T3) and public (or population) health (T4). Given the rapid e...
Background and Aims
Endoscopic ultrasound (EUS), magnetic resonance cholangiopancreatography (MRCP), and intraoperative cholangiogram (IOC) are the recommended diagnostic modalities for patients with intermediate probability for choledocholithiasis (IPC). The relative cost-effectiveness of these modalities in patients with cholelithiasis and IPC is...
The comprehensive characterization of clinical and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) testing data for patients with repeatedly positive SARS-CoV-2 tests can help prioritize suspected cases of reinfection for investigation in the absence of sequencing data and for continued surveillance of the potential long-term health co...
Objective:
Expand Operative Stress Score (OSS) increasing procedural coverage and assessing OSS and frailty association with Preoperative Acute Serious Conditions (PASC), complications and mortality in females versus males.
Summary background data:
Veterans Affairs male-dominated study showed high mortality in frail veterans even after very low...
Background
In the absence of genome sequencing, two positive molecular SARS-CoV-2 tests separated by negative tests, prolonged time, and symptom resolution remain the best surrogate measure of possible re-infection.
Methods
Using a large electronic health record database, we characterized clinical and testing data for 23 patients with repeatedly p...
We assessed the scalability of pharmacological signal detection use case from a single-site CDW to a large aggregated clinical data warehouse (single-site database with 754,214 distinct patient IDs vs. multisite database with 49.8M). We aimed to explore whether a larger clinical dataset would provide clearer signals for secondary analyses such as d...
The paradigm of representation learning through transfer learning has the potential to greatly enhance clinical natural language processing. In this work, we propose a multi-task pre-training and fine-tuning approach for learning generalized and transferable patient representations from medical language. The model is first pre-trained with differen...
Introduction
Drug safety research asks causal questions but relies on observational data. Confounding bias threatens the reliability of studies using such data. The successful control of confounding requires knowledge of variables called confounders affecting both the exposure and outcome of interest. Causal knowledge of dynamic biological systems...
The paradigm of representation learning through transfer learning has the potential to greatly enhance clinical natural language processing. In this work, we propose a multi-task pre-training and fine-tuning approach for learning generalized and transferable patient representations from medical language. The model is first pre-trained with differen...
Objective
To build a machine-learning model that predicts laboratory test results and provides a promising lab test reduction strategy, using spatial-temporal correlations.
Materials and methods
We developed a global prediction model to treat laboratory testing as a series of decisions by considering contextual information over time and across mod...
Objectives
Electronic Health Records (EHRs) contain scanned documents from a variety of sources such as identification cards, radiology reports, clinical correspondence, and many other document types. We describe the distribution of scanned documents at one health institution and describe the design and evaluation of a system to categorize document...
Background
AD is a devastating disease and its pathophysiology is still largely unknown. No treatment has been shown to be efficacious, so prevention remains a very valuable approach. The objective of this work is to statistically test the relationship between influenza vaccination and the incidence of AD to identify a candidate for AD prevention....
Background and aims:
The American Society for Gastrointestinal Endoscopy (ASGE) 2010 guidelines for suspected choledocholithiasis were recently updated by proposing more-specific criteria for selection of high-risk patients to undergo direct ERCP, while advocating use of additional imaging studies for intermediate- and low-risk individuals. We aim...
Defining patient-to-patient similarity is essential for the development of precision medicine in clinical care and research. Conceptually, the identification of similar patient cohorts appears straightforward; however, universally accepted definitions remain elusive. Simultaneously, an explosion of vendors and published algorithms have emerged and...
PURPOSE: Genomic analysis of individual patients is now affordable, and therapies targeting specific molecular aberrations are being tested in clinical trials. Genomically-informed therapy is relevant to many clinical domains, but is particularly applicable to cancer treatment. However, even specialized clinicians need help to interpret genomic dat...
Introduction:
Confounding bias threatens the reliability of observational studies and poses a significant scientific challenge.
This paper introduces a framework for identifying confounding factors by exploiting literature-derived computable knowledge.
In previous work, we have shown that semantic constraint search over computable knowledge extract...
Global pandemics call for large and diverse healthcare data to study various risk factors, treatment options, and disease progression patterns. Despite the enormous efforts of many large data consortium initiatives, scientific community still lacks a secure and privacy-preserving infrastructure to support auditable data sharing and facilitate autom...
Global pandemics call for large and diverse healthcare data to study various risk factors, treatment options, and disease progression patterns. Despite the enormous efforts of many large data consortium initiatives, the scientific community still lacks a secure and privacy- preserving infrastructure to support auditable data sharing and facilitate...
Importance
Suicide is a leading cause of mortality, with suicide-related deaths increasing in recent years. Automated methods for individualized risk prediction have great potential to address this growing public health threat. To facilitate their adoption, they must first be validated across diverse health care settings.
Objective
To evaluate the...
Serial laboratory testing is common, especially in Intensive Care Units (ICU). Such repeated testing is expensive and may even harm patients. However, identifying specific tests that can be omitted is challenging. The search space of different lab tests is large and the optimal reduction is hard to determine without modeling the time trajectory of...
Objective:
There is a lot of information about cancer in Electronic Health Record (EHR) notes that can be useful for biomedical research provided natural language processing (NLP) methods are available to extract and structure this information. In this paper, we present a scoping review of existing clinical NLP literature for cancer.
Methods:
We...
[This corrects the article DOI: 10.18632/oncotarget.16018.].
The well-known hazards of repurposing data make Data Quality (DQ) assessment a vital step towards ensuring valid results regardless of analytical methods. However, there is no systematic process to implement DQ assessments for secondary uses of clinical data. This paper presents DataGauge, a systematic process for designing and implementing DQ asse...
Objective:
The study sought to design, pilot, and evaluate a federated data completeness tracking system (CTX) for assessing completeness in research data extracted from electronic health record data across the Accessible Research Commons for Health (ARCH) Clinical Data Research Network.
Materials and methods:
The CTX applies a systems-based app...
Purpose:
Many targeted therapies are currently available only via clinical trials. Therefore, routine precision oncology using biomarker-based assignment to drug depends on matching patients to clinical trials. A comprehensive and up-to-date trial database is necessary for optimal patient-trial matching.
Methods:
We describe processes for establ...
e18080
Background: Implementation of electronic health records (EHRs) has engendered a large quantity of machine-readable data. However, different practices choose different EHR vendors and the same vendor product may be implemented differently at each practice. Motivated by the desire to facilitate appropriate integration of data, our goal was to...
e18074
Background: Physician reimbursement for care delivered to Medicare beneficiaries fundamentally changed with the 2015 MACRA legislation, requiring eligible clinicians to report quality measures in the Merit-Based Incentive Payment System (MIPS). To determine whether structured data in electronic health records (EHRs) were adequate to report M...
Data S1. Clinical and administrative data reuse for research protocol.
Objective: There is a lot of information about cancer in Electronic Health Record (EHR) notes that can be useful for biomedical research provided natural language processing (NLP) methods are available to extract and structure this information. In this paper, we present a scoping review of existing clinical NLP literature for cancer. Methods: We id...
Electronic health records are valuable for clinical and translational research. Institutions must protect patient privacy and comply with applicable regulations while allowing appropriate access to clinical data for research. The processes that investigators must follow to access clinical data can be substantially different at different institution...
Purpose:
We examined patterns, correlates, and the impact of cancer-related Internet use among patients with advanced cancer in a phase I clinical trials clinic for molecularly targeted oncologic agents.
Methods:
An anonymous questionnaire on Internet use for cancer-related purposes that incorporated input from phase I clinical trial oncologists...
There are an ever-increasing number of reports and commentaries that describe the challenges and opportunities associated with the use of big data and data science (DS) in the context of biomedical education, research, and practice. These publications argue that there are substantial benefits resulting from the use of data-centric approaches to sol...
Background:
The role of cancer-related internet use on the patient-physician relationship has not been adequately explored among patients who are cancer-related internet users (CIUs) in early-phase clinical trial clinics.
Objective:
We examined the association between cancer-related internet use and the patient-physician relationship and decisio...
BACKGROUND
The role of cancer-related internet use on the patient-physician relationship has not been adequately explored among patients who are cancer-related internet users (CIUs) in early-phase clinical trial clinics.
OBJECTIVE
We examined the association between cancer-related internet use and the patient-physician relationship and decision ma...
With the increasing availability of genomics, routine analysis of advanced cancers is now feasible. Treatment selection is frequently guided by the molecular characteristics of a patient's tumor, and an increasing number of trials are genomically-selected. Furthermore, multiple studies have demonstrated the benefit of therapies that are chosen base...
Background:
Genomic testing is increasingly performed in oncology, but concerns remain regarding the clinician's ability to interpret results. In the current study, the authors sought to determine the agreement between physicians and genomic annotators from the Precision Oncology Decision Support (PODS) team at The University of Texas MD Anderson...
High-throughput genomic and molecular profiling of tumors is emerging as an important clinical approach. Molecular profiling is increasingly being used to guide cancer patient care, especially in advanced and incurable cancers. However, navigating the scientific literature to make evidence-based clinical decisions based on molecular profiling resul...
Objective:
One promise of nationwide adoption of electronic health records (EHRs) is the availability of data for large-scale clinical research studies. However, because the same patient could be treated at multiple health care institutions, data from only a single site might not contain the complete medical history for that patient, meaning that...
At the ASCO Data Standards and Interoperability Summit held in May 2016, it was unanimously decided that four areas of current oncology clinical practice have serious, unmet health information technology needs. The following areas of need were identified: 1) omics and precision oncology, 2) advancing interoperability, 3) patient engagement, and 4)...
In the information age, we expect data systems to make us more effective and efficient-not to make our lives more difficult! In this article, we discuss how we are using data systems, such as electronic health records (EHRs), to improve care delivery. We illustrate how US Oncology is beginning to use real-world evidence to facilitate trial accrual...
In the information age, we expect data systems to make us more effective and efficient-not to make our lives more difficult! In this article, we discuss how we are using data systems, such as electronic health records (EHRs), to improve care delivery. We illustrate how US Oncology is beginning to use real-world evidence to facilitate trial accrual...
Background:
Patient matching is a key barrier to achieving interoperability. Patient demographic elements must be consistently collected over time and region to be valuable elements for patient matching.
Objectives:
We sought to determine what patient demographic attributes are collected at multiple institutions in the United States and see how...
Purpose:
Molecular profiling performed in the research setting usually does not benefit the patients that donate their tissues. Through a prospective protocol, we sought to determine the feasibility and utility of performing broad genomic testing in the research laboratory for discovery, and the utility of giving treating physicians access to rese...
Observational data recorded in the Electronic Health Record (EHR) can help us better understand the effects of therapeutic agents in routine clinical practice. As such data were not collected for research purposes, their reuse for research must compensate for additional information that may bias analyses and lead to faulty conclusions. Confounding...
Observational data recorded in the Electronic Health Record (EHR) can help us better understand the effects of therapeutic agents in routine clinical practice. As such data were not collected for research purposes, their reuse for research must compensate for additional information that may bias analyses and ultimately lead to faulty conclusions. C...
This special issue on precision medicine informatics flowed from the AMIA 2015 Translational Bioinformatics Summit theme of “Accelerating Precision Medicine”1 and President Obama’s 2015 State of the Union call “to give all of us access to the personalized information we need to keep ourselves and our families healthier.”2 The goal is to focus on th...
Jun Xu Hee-Jin Lee Jia Zeng- [...]
Hua Xu
Objective:
Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov.
Methods:
We developed a semi-automatic fra...
Introduction Genomic profiling information is frequently available to oncologists, enabling targeted cancer therapy. Because clinically relevant information is rapidly emerging in the literature and elsewhere, there is a need for informatics technologies to support targeted therapies. To this end, we have developed a system for Automated Identifica...
Objective:
To evaluate whether vector representations encoding latent topic proportions that capture similarities to MeSH terms can improve performance on biomedical document retrieval and classification tasks, compared to using MeSH terms.
Materials and methods:
We developed the TopicalMeSH representation, which exploits the 'correspondence' be...
Background:
Understanding patients' knowledge and prior information-seeking regarding personalized cancer therapy (PCT) may inform future patient information systems, consent for molecular testing and PCT protocols. We evaluated breast cancer patients' knowledge and information-seeking behaviors regarding PCT.
Methods:
Newly registered female br...
Large clinical datasets can be used to discover and monitor drug side effects. Many previous studies analyzed symptom data as discrete events. However, some drug side effects are inferred from continuous variables such as weight or blood pressure. These require additional assumptions for analysis. For example, we can define positive/negative thresh...
Rapidly improving understanding of molecular oncology, emerging novel therapeutics, and increasingly available and affordable next-generation sequencing have created an opportunity for delivering genomically informed personalized cancer therapy. However, to implement genomically informed therapy requires that a clinician interpret the patient's mol...
Automatically identifying specific phenotypes in free-text clinical notes is critically important for the reuse of clinical data. In this study, the authors combine expert-guided feature (text) selection with one-class classification for text processing.
To compare the performance of one-class classification to traditional binary classification; to...
Thirty-Seventh Annual CTRC-AACR San Antonio Breast Cancer Symposium; December 9-13, 2014; San Antonio, TX
INTRODUCTION: Breast cancer patients and providers are increasingly interested in personalized cancer therapy. Information-seeking behaviors and knowledge about personalized cancer therapy, cancer genetics, and molecular testing may influence...
Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguat...
In alignment with a major shift toward patient-centered care as the model for improving care in our health system, informatics is transforming patient-provider relationships and overall care delivery. AMIA's 2013 Health Policy Invitational was focused on examining existing challenges surrounding full engagement of the patient and crafting a researc...
BACKGROUND
This study assessed attitudes of breast cancer patients toward molecular testing for personalized therapy and research.METHODSA questionnaire was given to female breast cancer patients presenting to a cancer center. Associations between demographic and clinical variables and attitudes toward molecular testing were evaluated.RESULTSThree...