About
786
Publications
138,703
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
17,005
Citations
Introduction
Dr. Hongfang Liu's primary research focus is to facilitate the use of real-world data for clinical and translational science research and health care delivery improvement using data science, artificial intelligence, and informatics approaches.
Current institution
Additional affiliations
January 2006 - January 2011
May 2005 - December 2011
January 2005 - December 2012
Publications
Publications (786)
Background
There is a significant delay between symptom onset and diagnosis of childhood asthma, but the impact of this delay on asthma outcomes has not been well understood.
Objectives
We sought to study the association of delayed diagnosis of asthma with asthma exacerbations (AEs) in children.
Methods
Using the Mayo Clinic birth cohort, we iden...
Understanding the disease trajectories of specific diseases can provide important clinical insights. In this paper, we aimed to discover signature disease trajectories of 3 rare cancer types: pancreatic cancer, soft tissue sarcoma (STS) of the trunk and extremity (STS-TE), and STS of the abdomen and retroperitoneum (STS-AR), leveraging IQVIA Oncolo...
Electronic health record (EHR) data are a rich and invaluable source of real-world clinical information, enabling detailed insights into patient populations, treatment outcomes, and healthcare practices. The availability of large volumes of EHR data are critical for advancing translational research and developing innovative technologies such as art...
Objective: Suicide is a critical medical and public health challenge, particularly among individuals with mental illnesses in safety-net hospitals. To uncover insights about suicidality embedded in unstructured clinical notes, we propose to annotate, analyze, and model a corpus for suicidality understanding and lifesaving.
Methods: A multidisciplin...
BACKGROUND
Complementary therapies are being increasingly used by cancer patients. As a channel for customers to share their feelings, outcomes, ideas, and perceived knowledge about the products purchased from e-commerce platforms, Amazon online reviews are a valuable real-world data source for health care studies.
OBJECTIVE
In this study, we aim...
Obesity affects approximately 34% of adults and 15–20% of children and adolescents in the U.S, and poses significant economic and psychosocial burdens. Due to the multifaceted nature of obesity, currently patient responses to any single anti-obesity medication (AOM) vary significantly, highlighting the need for developing approaches to obesity deep...
Background: Over the past two decades, the Food and Drug Administration (FDA) has significantly increased the approval of anti−obesity medications (AOMs) for obesity management. Both FDA−approved AOMs (F−AOMs) and off−label AOMs (O−AOMs) have gained popularity and demonstrated promising results in randomized clinical trials (RCTs). However, their e...
Idiopathic pulmonary fibrosis (IPF) is a rare disease that is challenging to diagnose. Patients with IPF often spend years awaiting a diagnosis after the onset of initial respiratory symptoms, and only a small percentage receive antifibrotic treatment. In this study, we examine the associations between social determinants of health (SDoH) and two c...
Introduction
While incidentally discovered covert cerebrovascular diseases (id-CCD) are associated with future stroke, it is not known if patients with id-CCD are prescribed statins.
Methods
Patients age ≥50 with id-CCD on neuroimaging from 2009 to 2019 with no prior ischaemic stroke, transient ischaemic attack or dementia were identified using na...
Diagnosis prediction is a critical task in healthcare, where timely and accurate identification of medical conditions can significantly impact patient outcomes. Traditional machine learning and deep learning models have achieved notable success in this domain but often lack interpretability which is a crucial requirement in clinical settings. In th...
Accurate identification and categorization of suicidal events can yield better suicide precautions, reducing operational burden, and improving care quality in high-acuity psychiatric settings. Pre-trained language models offer promise for identifying suicidality from unstructured clinical narratives. We evaluated the performance of four BERT-based...
Reoperation is the most significant complication following any surgical procedure. Developing machine learning methods that predict the need for reoperation will allow for improved shared surgical decision making and patient-specific and preoperative optimisation. Yet, no precise machine learning models have been published to perform well in predic...
Perioperative organ injury represents a significant clinical challenge, leading to severe and often long-term complications for patients undergoing surgery. Despite its critical importance, the basic science underlying these injuries remains underexplored. This study employs bibliometric analysis to assess the research landscape of perioperative or...
Background
The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) that is developed and maintained by the Observational Health Data Sciences and Informatics (OHDSI) community supports large scale cancer research by enabling distributed network analysis. As the number of studies using the OMOP CDM for cancer research increases...
Traditional biomedical artificial intelligence (AI) models, designed for specific tasks or modalities, often exhibit limited flexibility in real-world deployment and struggle to utilize holistic information. Generalist AI holds the potential to address these limitations due to its versatility in interpreting different data types and generating tail...
Background
Despite their growing use in health care, pretrained language models (PLMs) often lack clinical relevance due to insufficient domain expertise and poor interpretability. A key strategy to overcome these challenges is integrating external knowledge into PLMs, enhancing their adaptability and clinical usefulness. Current biomedical knowled...
Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Their utility is particularly significant in the context of rare diseases, where data scarcity, complexity, and specificity pose considerable challenges. In the clinical domain, Named Entity Recognition (NER) stand...
Objectives
The surge in patient portal messages (PPMs) with increasing needs and workloads for efficient PPM triage in healthcare settings has spurred the exploration of AI-driven solutions to streamline the healthcare workflow processes, ensuring timely responses to patients to satisfy their healthcare needs. However, there has been less focus on...
Medical Large Language Models (LLMs) such as ClinicalCamel 70B, Llama3-OpenBioLLM 70B have demonstrated impressive performance on a wide variety of medical NLP task.However, there still lacks a large language model (LLM) specifically designed for cancer domain. Moreover, these LLMs typically have billions of parameters, making them computationally...
The consistent and persuasive evidence illustrating the influence of social determinants on health has prompted a growing realization throughout the health care sector that enhancing health and health equity will likely depend, at least to some extent, on addressing detrimental social determinants. However, detailed social determinants of health (S...
Background
Error analysis plays a crucial role in clinical concept extraction, a fundamental subtask within clinical natural language processing (NLP). The process typically involves a manual review of error types, such as contextual and linguistic factors contributing to their occurrence, and the identification of underlying causes to refine the N...
Background: Postoperative ileus (POI) after colorectal surgery leads to increased morbidity, costs, and hospital stays. Identifying POI risk for early intervention is important for improving surgical outcomes especially given the increasing trend towards early discharge after surgery. While existing studies have assessed POI risk with regression mo...
Objectives
Generative large language models (LLMs) are a subset of transformers-based neural network architecture models. LLMs have successfully leveraged a combination of an increased number of parameters, improvements in computational efficiency, and large pre-training datasets to perform a wide spectrum of natural language processing (NLP) tasks...
Introduction
Letters of recommendation (LOR) are an integral component of physical therapy residency applications. Identifying the influence of applicant and writer gender in LOR will help identify whether potential implicit gender bias exists in physical therapy residency application processes.
Review of Literature
Several medical and surgical re...
The use of digital twins (DTs) has proliferated across various fields and industries, with a recent surge in the healthcare sector. The concept of digital twin for health (DT4H) holds great promise to revolutionize the entire healthcare system, including management and delivery, disease treatment and prevention, and health well-being maintenance, u...
Covert cerebrovascular disease (CCD) is frequently reported on neuroimaging and associates with increased dementia and stroke risk. We aimed to determine how incidentally-discovered CCD during clinical neuroimaging in a large population associates with mortality. We screened CT and MRI reports of adults aged ≥50 in the Kaiser Permanente Southern Ca...
We investigate risk factors for severe COVID-19 in persons living with HIV (PWH), including among racialized PWH, using the U.S. population-sampled National COVID Cohort Collaborative (N3C) data released from January 1, 2020 to October 10, 2022. We defined severe COVID-19 as hospitalized with invasive mechanical ventilation, extracorporeal membrane...
BACKGROUND
Despite their growing use in health care, pretrained language models (PLMs) often lack clinical relevance due to insufficient domain expertise and poor interpretability. A key strategy to overcome these challenges is integrating external knowledge into PLMs, enhancing their adaptability and clinical usefulness. Current biomedical knowled...
Advances in genetic technology have led to the increasing use of genomic panels in precision oncology practice, with panels ranging from a couple to hundreds of genes. However, the clinical utilization and utility of oncology genomic panels, especially among vulnerable populations, is unclear. We examined the association of panel size with socioeco...
Clinical information retrieval (IR) plays a vital role in modern healthcare by facilitating efficient access and analysis of medical literature for clinicians and researchers. This scoping review aims to offer a comprehensive overview of the current state of clinical IR research and identify gaps and potential opportunities for future studies in th...
To address the current and long‐term unmet health needs of the growing population of non‐Hodgkin lymphoma (NHL) patients, we established the Lymphoma Epidemiology of Outcomes (LEO) cohort study (NCT02736357; https://leocohort.org/). A total of 7735 newly diagnosed patients aged 18 years and older with NHL were prospectively enrolled from 7/1/2015 t...
Background: Covert cerebrovascular disease (CCD) includes white matter disease (WMD) and covert brain infarction (CBI). Incidentally-discovered CCD is associated with increased risk of subsequent symptomatic stroke. However, it is unknown whether the severity of WMD or the location of CBI predicts risk. Objectives: To examine the association of inc...
"The Great Resignation" has become a concern for many in healthcare since the pandemic. Inspired by the literature on social network analysis, we applied SNA techniques to analyze the impact of the Great Resignation on a large health research center. We found that although the great resignation has caused evident turbulence among inter-program and...
Traditional trial designs have well-recognized inefficiencies and logistical barriers to participation. Decentralized trials and digital health solutions have been suggested as potential solutions and have certainly risen to the challenge during the pandemic. Clinical trial designs are now increasingly data driven. The use of distributed clinical d...
Objective
The effects of Coronavirus disease 2019 (COVID-19) infection and altered processes of care on nonelective coronary artery bypass grafting (CABG) outcomes remain unknown. We hypothesized that patients with COVID-19 infection would have longer hospital lengths of stay and greater mortality compared with COVID-negative patients, but that the...
Introduction
We tested the ability of our natural language processing (NLP) algorithm to identify delirium episodes in a large-scale study using real-world clinical notes.
Methods
We used the Rochester Epidemiology Project to identify persons ≥ 65 years who were hospitalized between 2011 and 2017. We identified all persons with an International Cl...
Objectives
To define pregnancy episodes and estimate gestational age within electronic health record (EHR) data from the National COVID Cohort Collaborative (N3C).
Materials and Methods
We developed a comprehensive approach, named Hierarchy and rule-based pregnancy episode Inference integrated with Pregnancy Progression Signatures (HIPPS), and app...
Accurate prediction models for individual-level endpoints and time-to-endpoints are crucial in clinical practice. In this study, we propose a novel approach, GRU-D-Weibull, which combines gated recurrent units with decay (GRU-D) to model the Weibull distribution. Our method enables real-time individualized endpoint prediction and population-level r...
Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site set...
Objective:
Transformer-based language models are prevailing in the clinical domain due to their excellent performance on clinical NLP tasks. The generalizability of those models is usually ignored during the model development process. This study evaluated the generalizability of CancerBERT, a Transformer-based clinical NLP model, along with classi...
Clinical phenotyping is often a foundational requirement for obtaining datasets necessary for the development of digital health applications. Traditionally done via manual abstraction, this task is often a bottleneck in development due to time and cost requirements, therefore raising significant interest in accomplishing this task via in-silico mea...
UNSTRUCTURED
Integrating machine learning (ML) models into clinical practice presents a challenge of maintaining their efficacy over time. While existing literature offers valuable strategies for detecting declining model performance, there is a need to document the broader challenges and solutions associated with the real-world development and int...
Integrating machine learning (ML) models into clinical practice presents a challenge of maintaining their efficacy over time. While existing literature offers valuable strategies for detecting declining model performance, there is a need to document the broader challenges and solutions associated with the real-world development and integration of m...
Background
Extractive question-answering (EQA) is a useful natural language processing (NLP) application for answering patient-specific questions by locating answers in their clinical notes. Realistic clinical EQA can yield multiple answers to a single question and multiple focus points in 1 question, which are lacking in existing data sets for the...
Gender stereotyping is the practice of assigning or ascribing specific characteristics, differences, or identities to a person solely based on their gender. Biased conceptions of gender can create barriers to equality and need to be proactively identified and addressed. In biomedical education, letters of recommendation (LOR) are considered an impo...
The rapid proliferation and implementation of electronic health record (EHR) systems have reshaped the documentation and management of patient data. This transformation has facilitated and accelerated the secondary use of EHRs for clinical research. A common approach to leveraging EHRs is via manual chart review, a process of reviewing or extractin...
BACKGROUND
A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations a...
Background
A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations a...
Objective:
Social determinants of health (SDoH) play critical roles in health outcomes and well-being. Understanding the interplay of SDoH and health outcomes is critical to reducing healthcare inequalities and transforming a "sick care" system into a "health-promoting" system. To address the SDOH terminology gap and better embed relevant elements...
In this paper, we introduce a unified and generalist Biomedical Generative Pre-trained Transformer (BiomedGPT) model, which leverages self-supervision on large and diverse datasets to accept multi-modal inputs and perform a range of downstream tasks. Our experiments demonstrate that BiomedGPT delivers expansive and inclusive representations of biom...
In 2020, the CoViD-19 pandemic spread worldwide in an unexpected way and suddenly modified many life issues, including social habits, social relationships, teaching modalities, and more. Such changes were also observable in many different healthcare and medical contexts. Moreover, the CoViD-19 pandemic acted as a stress test for many research endea...
Family history (FH) is important for disease risk assessment and prevention. However, incorporating FH information derived from electronic health records (EHRs) for downstream analytics is challenging due to the lack of standardization. We aimed to automatically align FH concepts derived from a clinical corpus to disease category resources popularl...
A gold standard annotated corpus is usually indispensable when developing natural language processing (NLP) systems. Building a high-quality annotated corpus for clinical NLP requires considerable time and domain expertise during the annotation process. Existing annotation tools may provide powerful features to cover various needs of text annotatio...
Chronic pain (CP) lasts for more than 3 months, causing prolonged physical and mental burdens to patients. According to the US Centers for Disease Control and Prevention, CP contributes to more than 500 billion US dollars yearly in direct medical cost plus the associated productivity loss. CP is complex in etiology and can occur anywhere in the bod...
BACKGROUND
A patient’s family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized way to capture FH information in electronic health records (EHR) and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to us...
Background
A patient’s family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized method to capture FH information in electronic health records and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to use i...
Exogenous estrogen is associated with reduced COVID mortality in non-immunosuppressed/immunocompromised (non-ISC) post-menopausal females. Here, we examined the association of estrogen or testosterone hormone replacement therapy (HRT) with COVID outcomes in solid organ transplant recipients (SOTR) compared to non-ISC individuals, given known differ...
Background
The incorporation of information from clinical narratives is critical for computational phenotyping. The accurate interpretation of clinical terms highly depends on their associated context, especially the corresponding clinical section information. However, the heterogeneity across different Electronic Health Record (EHR) systems poses...
Background: Clinical information retrieval (IR) plays a vital role in modern healthcare by facilitating efficient access and analysis of medical literature for clinicians and researchers. This scoping review aims to offer a comprehensive overview of the current state of clinical IR research and identify gaps and potential opportunities for future s...
Multiplex immunofluorescence (MxIF) images provide detailed information of cell composition and spatial context for biomedical research. However, compromised data quality could lead to research biases. Comprehensive image quality checking (QC) is essential for reliable downstream analysis. As a reliable and specific staining of cell nuclei, 4',6-di...
Objective: The generalizability of clinical large language models is usually ignored during the model development process. This study evaluated the generalizability of BERT-based clinical NLP models across different clinical settings through a breast cancer phenotype extraction task. Materials and Methods: Two clinical corpora of breast cancer pati...
Gender stereotyping is the practice of assigning or ascribing specific characteristics, differences, or identities to a person solely based on their gender. Biased conceptions of gender can create barriers to equality and need to be proactively identified and addressed. In biomedical education, letters of recommendation (LOR) are considered an impo...
Importance:
The effectiveness of triplet therapy compared with androgen pathway inhibitor (API) doublets in a heterogeneous patient population with metastatic castration-sensitive prostate cancer (mCSPC) is unknown.
Objective:
To assess the comparative effectiveness of contemporary systemic treatment options for patients with mCSPC across clinic...
Clinical documentation in electronic health records contains crucial narratives and details about patients and their care. Natural language processing (NLP) can unlock the information conveyed in clinical notes and reports, and thus plays a critical role in real-world studies. The NLP Working Group at the Observational Health Data Sciences and Info...
Depression is a widespread mental health issue, affecting an estimated 3.8% of the global population. It is also one of the main contributors to disability worldwide. Recently it is becoming popular for individuals to use social media platforms (e.g., Reddit) to express their difficulties and health issues (e.g., depression) and seek support from o...
Context
Metformin is the first-line drug for treating diabetes but has a high failure rate.
Objective
To identify demographic and clinical factors available in the electronic health record (EHR) that predict metformin failure.
Methods
A cohort of patients with at least one abnormal diabetes screening test that initiated metformin was identified a...
Introduction: Despite the decline in overall mortality and incidence of cancer in the US population, disparities in cancer care still largely exist within certain groups. The proportion of racial and ethnic minorities recruited to participate in cancer research is persistently lower than the US population. The rapid adoption of electronic health re...
An increasing number of studies have reported using natural language processing (NLP) to assist observational research by extracting clinical information from electronic health records (EHRs). Currently, no standardized reporting guidelines for NLP‐assisted observational studies exist. The absence of detailed reporting guidelines may create ambigui...
Background
Covert cerebrovascular disease (CCD) has been shown to be associated with dementia in population‐based studies with magnetic resonance imaging (MRI) screening, but dementia risk associated with incidentally discovered CCD is not known.
Methods and Results
Individuals aged ≥50 years enrolled in the Kaiser Permanente Southern California h...
Real-time individual endpoint prediction has always been a challenging task but of great clinic utility for both patients and healthcare providers. With 6,879 chronic kidney disease stage 4 (CKD4) patients as a use case, we explored the feasibility and performance of gated recurrent units with decay that models Weibull probability density function...
Social determinants of health (SDoH) have a significant impact on health outcomes and well-being. Addressing SDoH is the key to reducing healthcare inequalities and transforming a sick care system into a health-promoting system. To address the SDOH terminology gap and better embed relevant elements in advanced biomedical informatics, we propose an...
BACKGROUND
Aspirin exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation o...
Background
Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation o...
Background
The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the lar...
Motivation:
Despite the increasing evidence of utility of genomic medicine in clinical practice, systematically integrating genomic medicine information and knowledge into clinical systems with a high-level of consistency, scalability, and computability remains challenging. A comprehensive terminology is required for relevant concepts and the asso...
Background
As one of the key criteria to differentiate benign vs. malignant tumors in ovarian and other solid cancers, tumor-stroma reaction (TSR) is long observed by pathologists and has been found correlated with patient prognosis. However, paucity of study aims to overcome subjective bias or automate TSR evaluation for enabling association analy...